regex help

GaijinPunch has asked for the wisdom of the Perl Monks concerning the following question:

Hey kids. Sucking some stuff out of an HTML file. It's not enough to warrant any module, as it's pretty clean stuff. However, there's one thing I want to do.

( $result = $htmlline ) =~ s/<tag marker>(.+?)</td>/$1/;
[download]

This works, but every now and again $1 holds some br's (sorry --if I type it like a tag, it's interpretted as such). Sometimes zero, sometimes ten. I know I can add a line to it and easily take them out, but is there an easier 1-line solution? I've been doing Perl for over a year now... it's probably time I got a bit funkier w/ my regex's. I thought this would work, but didn't.

( ( $result = $htmlline ) =~ s/<tag marker>(.+?)</td>/$1/ ) =~ s/<br>/
+/g;
[download]

...but says I can't substitute in a substitute.

Comment on regex help Select or Download Code

Replies are listed 'Best First'.
Re: regex help by JediWizard (Deacon) on Jun 27, 2005 at 01:44 UTC
Any reason you are opposed to two lines? `($result = $htmlline) =~ s/<tag marker>(.+?)</td>/$1/; $result =~ s/<br(?:\s\/)?>/\n/g;` [download] Update: If you really want one line: (untested) `( $result = $htmlline ) =~ s/{<tag marker>(.+?)</td>}{join("\n", split +(/<br(?:\s/?)?>/, $1))}eg;` [download] They say that time changes things, but you actually have to change them yourself. —Andy Warhol	[reply] [d/l] [select]
Re: regex help by Samy_rio (Vicar) on Jun 27, 2005 at 01:52 UTC
Hi, check whether it helps you. `( $result = $htmlline ) =~ s/<tag marker>(.+?)<\/td>/my $tmp = $1; $tm +p =~ s!<br>!!g; $tmp/eg;` [download] Regards, Samy	[reply] [d/l]
Re: regex help by GrandFather (Saint) on Jun 27, 2005 at 01:47 UTC
Show us what you want to achieve by giving some example text that gets mishandled. Note that you can include tag stuff by wrapping it in `<code> ... </code>` tags. Perl is Huffman encoded by design.	[reply] [d/l]
Re: regex help by davidrw (Prior) on Jun 27, 2005 at 01:56 UTC
Still sorta 1 line (though just keeping as two lines is probably best for clarity), but can just change your: `( ( $result = $htmlline ) =~ s/<tag marker>(.+?)</td>/$1/ ) =~ s/<br>/ +/g;` [download] to this to ditch the "can't substitute in a substitute" error: `( ( $result = $htmlline ) =~ s#<tag marker>(.+?)</td>#$1# ) =~ s/<br>/ +/g && $result =~ s/<br>//g;` [download]	[reply] [d/l] [select]
Re^2: regex help by GaijinPunch (Pilgrim) on Jun 27, 2005 at 02:11 UTC
I'm not really opposed to two lines... just always see more complex (and ultimately... not necessarily in this case) better optimzed code. Just thought I'd try to push myself, that's all. Thanks for the tips... I will give them a whirl.	[reply]
Re: regex help by injunjoel (Priest) on Jun 27, 2005 at 02:43 UTC
Greetings, Looks like someone beat me to it but here it goes anyway. `($result = $htmlline) =~ s/<tag marker>(.+?)<\/td>/my $t=$1; $t=~s!<br +>!!g; $t;/eg` [download] The /e switch is a beautiful thing. -InjunJoel "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo	[reply] [d/l]
Re: regex help by GrandFather (Saint) on Jun 27, 2005 at 01:55 UTC
Most likely what you want is: `([^<]+?) (rather than (.+?))` [download] to avoid collecting any tags. Perl is Huffman encoded by design.	[reply] [d/l]