in reply to Loop and Find & Replace
in thread replacing text in specific tags

Your formatting is pretty much screwed up. To be honest, I don't see what you are saying. Try to fix it and I'll do my best to help you. In the meantime I'd recommend the HTML::TokeParser Tutorial. In general, if you do extensive HTML or XML processing, consider using a module.
Cheers, CombatSquirrel.
  • Comment on Re: Re: Efficiency issues in text parsing

Replies are listed 'Best First'.
Re: Efficiency issues in text parsing
by texuser74 (Monk) on Aug 27, 2003 at 00:44 UTC
    Hi, Thanks for your comments. One more small doubt i have.

    here is the input data

    <input> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </input>
    <output> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </output>
    i.e. i want to make the <p>...</p> as single line. i mean delete the carrage returns only inside <p>...</p> my following code does the job, but only for the last <p>...</p>. i don't know how to loop it here. pls suggest
    $infile = $ARGV[0]; open(IN, '<', "temp.in") || die "\nCan't open temp.in \n"; open(OUT, '>' "temp.out"); $/=""; while(<IN>) { if($_=~s/(.*)&lt;p&gt;(.*)\<\/p\>(.*)//ms) { $pre = $1; $par = $2; $pos = $3; $par=~s#\n# #ig; print OUT "$pre&lt;p&gt;$par\<\/p\>$pos"; } } close(IN); close(OUT);
    Note: also please let me know how to include the source code in this page, any special tags for that? i mean the code formatting is often getting messed when i post

    edited by ybiC: Reformatted to avoid lateral scrolling in browser window - balanced <code>tags around example input+output and code

      Have a look at Writeup Formatting Tips.
      To your problem: The following program did the trick for me:
      #!perl use strict; use warnings; { # braces for localization of $/ local $/ = '<p>'; # end of record is now <p> print scalar <DATA>; # first chunk contains everything before first <p> tag, just pri +nt for (<DATA>) { s@([\d\D]*?</p>)@ my $var = $1; $var =~ s!\n! !g; $var @e; # substitute newlines by spaces before the closing </p> tag print; } } __DATA__ This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test
      Hope this helped.
      CombatSquirrel.
      Entropy is the tendency of everything going to hell.
        Your code does the magic. Thanks you very much

        but one small doubt: what does "$var @e;" mean, particularly "@e", what does it mean here.

        can you please suggest me some good perl book to handle this kind of stuffs.

        once again, thanks a lot