Loop and Find & Replace

Hi, Thanks for your comments. One more small doubt i have.

here is the input data

<input>
This is to test. this is to test
<p>This is to test. This is
to test</p>
<p>This is to test. This is
to test</p>
This is to test.
this is to test
</input>

<output>
This is to test. this is to test
<p>This is to test. This is to test</p>
<p>This is to test. This is to test</p>
This is to test.
this is to test
</output>
[download]

i.e. i want to make the ... as single line. i mean delete the carrage returns only inside ... my following code does the job, but only for the last .... i don't know how to loop it here. pls suggest

$infile = $ARGV[0];

open(IN, '<', "temp.in") || die "\nCan't open temp.in \n";
open(OUT, '>' "temp.out");
$/="";
while(<IN>)
{
    if($_=~s/(.*)&lt;p&gt;(.*)\<\/p\>(.*)//ms)
    {
        $pre =  $1;
        $par =  $2;
        $pos =  $3;

        $par=~s#\n# #ig;
        print OUT "$pre&lt;p&gt;$par\<\/p\>$pos";
    }
}
close(IN);
close(OUT);
[download]

Note: also please let me know how to include the source code in this page, any special tags for that? i mean the code formatting is often getting messed when i post _{edited by ybiC: Reformatted - balanced <code> tags around sample input and code}

Comment on Loop and Find & Replace Select or Download Code

Replies are listed 'Best First'.
Re: Re: Efficiency issues in text parsing by CombatSquirrel (Hermit) on Aug 25, 2003 at 08:51 UTC
Your formatting is pretty much screwed up. To be honest, I don't see what you are saying. Try to fix it and I'll do my best to help you. In the meantime I'd recommend the HTML::TokeParser Tutorial. In general, if you do extensive HTML or XML processing, consider using a module. Cheers, CombatSquirrel.	[reply]
Re: Efficiency issues in text parsing by texuser74 (Monk) on Aug 27, 2003 at 00:44 UTC
Hi, Thanks for your comments. One more small doubt i have. here is the input data `<input> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </input>` [download] `<output> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </output>` [download] i.e. i want to make the <p>...</p> as single line. i mean delete the carrage returns only inside <p>...</p> my following code does the job, but only for the last <p>...</p>. i don't know how to loop it here. pls suggest `$infile = $ARGV[0]; open(IN, '<', "temp.in") \|\| die "\nCan't open temp.in \n"; open(OUT, '>' "temp.out"); $/=""; while(<IN>) { if($_=~s/(.)<p>(.)\<\/p\>(.*)//ms) { $pre = $1; $par = $2; $pos = $3; $par=~s#\n# #ig; print OUT "$pre<p>$par\<\/p\>$pos"; } } close(IN); close(OUT);` [download] Note: also please let me know how to include the source code in this page, any special tags for that? i mean the code formatting is often getting messed when i post _{edited by ybiC: Reformatted to avoid lateral scrolling in browser window - balanced <code>tags around example input+output and code}	[reply] [d/l] [select]
Re: Re: Efficiency issues in text parsing by CombatSquirrel (Hermit) on Aug 27, 2003 at 01:04 UTC
Have a look at Writeup Formatting Tips. To your problem: The following program did the trick for me: #!perl use strict; use warnings; { # braces for localization of $/ local $/ = '<p>'; # end of record is now <p> print scalar <DATA>; # first chunk contains everything before first <p> tag, just pri +nt for (<DATA>) { s@([\d\D]*?</p>)@ my $var = $1; $var =~ s!\n! !g; $var @e; # substitute newlines by spaces before the closing </p> tag print; } } __DATA__ This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test [download] Hope this helped. CombatSquirrel. Entropy is the tendency of everything going to hell.	[reply] [d/l]
Re: Efficiency issues in text parsing by texuser74 (Monk) on Aug 27, 2003 at 06:25 UTC
Re: Re: Efficiency issues in text parsing by CombatSquirrel (Hermit) on Aug 27, 2003 at 09:50 UTC
Some notes below your chosen depth have not been shown here