Re: Re: Efficiency issues in text parsing

Replies are listed 'Best First'.
Re: Efficiency issues in text parsing by texuser74 (Monk) on Aug 27, 2003 at 00:44 UTC
Hi, Thanks for your comments. One more small doubt i have. here is the input data `<input> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </input>` [download] `<output> This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test </output>` [download] i.e. i want to make the <p>...</p> as single line. i mean delete the carrage returns only inside <p>...</p> my following code does the job, but only for the last <p>...</p>. i don't know how to loop it here. pls suggest `$infile = $ARGV[0]; open(IN, '<', "temp.in") \|\| die "\nCan't open temp.in \n"; open(OUT, '>' "temp.out"); $/=""; while(<IN>) { if($_=~s/(.)<p>(.)\<\/p\>(.*)//ms) { $pre = $1; $par = $2; $pos = $3; $par=~s#\n# #ig; print OUT "$pre<p>$par\<\/p\>$pos"; } } close(IN); close(OUT);` [download] Note: also please let me know how to include the source code in this page, any special tags for that? i mean the code formatting is often getting messed when i post _{edited by ybiC: Reformatted to avoid lateral scrolling in browser window - balanced <code>tags around example input+output and code}	[reply] [d/l] [select]
Re: Re: Efficiency issues in text parsing by CombatSquirrel (Hermit) on Aug 27, 2003 at 01:04 UTC
Have a look at Writeup Formatting Tips. To your problem: The following program did the trick for me: #!perl use strict; use warnings; { # braces for localization of $/ local $/ = '<p>'; # end of record is now <p> print scalar <DATA>; # first chunk contains everything before first <p> tag, just pri +nt for (<DATA>) { s@([\d\D]*?</p>)@ my $var = $1; $var =~ s!\n! !g; $var @e; # substitute newlines by spaces before the closing </p> tag print; } } __DATA__ This is to test. this is to test <p>This is to test. This is to test</p> <p>This is to test. This is to test</p> This is to test. this is to test [download] Hope this helped. CombatSquirrel. Entropy is the tendency of everything going to hell.	[reply] [d/l]
Re: Efficiency issues in text parsing by texuser74 (Monk) on Aug 27, 2003 at 06:25 UTC
Your code does the magic. Thanks you very much but one small doubt: what does "$var @e;" mean, particularly "@e", what does it mean here. can you please suggest me some good perl book to handle this kind of stuffs. once again, thanks a lot	[reply]
Re: Re: Efficiency issues in text parsing by CombatSquirrel (Hermit) on Aug 27, 2003 at 09:50 UTC
Multiline Mode by Anonymous Monk on Sep 05, 2003 at 08:40 UTC
Some notes below your chosen depth have not been shown here