in reply to Re: Re: Efficiency issues in text parsing
in thread replacing text in specific tags

Hi, Thanks for your comments. One more small doubt i have.

here is the input data

<input> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </input>
<output> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </output>
i.e. i want to make the <p>...</p> as single line. i mean delete the carrage returns only inside <p>...</p> my following code does the job, but only for the last <p>...</p>. i don't know how to loop it here. pls suggest
$infile = $ARGV[0]; open(IN, '<', "temp.in") || die "\nCan't open temp.in \n"; open(OUT, '>' "temp.out"); $/=""; while(<IN>) { if($_=~s/(.*)&lt;p&gt;(.*)\<\/p\>(.*)//ms) { $pre = $1; $par = $2; $pos = $3; $par=~s#\n# #ig; print OUT "$pre&lt;p&gt;$par\<\/p\>$pos"; } } close(IN); close(OUT);
Note: also please let me know how to include the source code in this page, any special tags for that? i mean the code formatting is often getting messed when i post

edited by ybiC: Reformatted to avoid lateral scrolling in browser window - balanced <code>tags around example input+output and code

Replies are listed 'Best First'.
Re: Re: Efficiency issues in text parsing
by CombatSquirrel (Hermit) on Aug 27, 2003 at 01:04 UTC
    Have a look at Writeup Formatting Tips.
    To your problem: The following program did the trick for me:
    #!perl use strict; use warnings; { # braces for localization of $/ local $/ = '<p>'; # end of record is now <p> print scalar <DATA>; # first chunk contains everything before first <p> tag, just pri +nt for (<DATA>) { s@([\d\D]*?</p>)@ my $var = $1; $var =~ s!\n! !g; $var @e; # substitute newlines by spaces before the closing </p> tag print; } } __DATA__ This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test
    Hope this helped.
    CombatSquirrel.
    Entropy is the tendency of everything going to hell.
      Your code does the magic. Thanks you very much

      but one small doubt: what does "$var @e;" mean, particularly "@e", what does it mean here.

      can you please suggest me some good perl book to handle this kind of stuffs.

      once again, thanks a lot

        The "@" is just the seperator for the RegEx which starts with "s@". The "e" is a modifier that specifies that the substitution part should be evaluated and the result be taken as the real substitute. And since the last line is always the return value, I just put $var as the last value, because it contains the substitue.
        Cheers,
        CombatSquirrel.
        Entropy is the tendency of everything going to hell.