I'm trying to convert an HTML file to a Wiki format. The HTML is...not well formed, and so I have to do some "intelligent" processing.
Basically there are some multi-line chunks that I need to actually remove some tags. These chunks look like this:
<P ALIGN=LEFT STYLE="background: #b3b3b3; font-style: normal"><B>Figur +e 1.10</B></P> <P ALIGN=LEFT STYLE="background: #b3b3b3; font-style: normal">rasto@za +phod ~ $ screen -x</P> <P ALIGN=LEFT STYLE="background: #b3b3b3; font-style: normal">There are several suitable screens on:</P> <P ALIGN=LEFT STYLE="background: #b3b3b3; font-style: normal"> 12207.pts-1.Evil-Fish (Attached)</P> <P ALIGN=LEFT><SPAN STYLE="font-style: normal">
If I print out $match after my series of substutions, it looks fine. The problem is folding these changes back into $text.#!/usr/bin/perl my $file = $ARGV[0]; open my $fh, $file; my $text; foreach my $line (readline $fh) { $text .= $line; } foreach my $match ( $text =~ /(<P ALIGN=LEFT STYLE="background: #b3b3b +3; font-style: normal"><B>.*?<P ALIGN=LEFT><SPAN STYLE="font-style: n +ormal">)/gsm) { my $temp = $match; $match =~ s/<P ALIGN=LEFT STYLE="background: #b3b3b3; font-style: n +ormal"><B>/<pre>/; $match =~ s/<P ALIGN=LEFT STYLE="background: #b3b3b3; font-style: n +ormal"><B>//g; $match =~ s/<P ALIGN=LEFT><SPAN STYLE="font-style: normal">/<\/pre> +/; $match =~ s/<\/B>//g; $text =~ s/$temp/$match/gsm; } print $text;
What's happening is the substutution in the third to last line is not matching, *except* on the last match of the foreach loop.
Clearly I'm missing something. I did a proof of concept with some small, simple, multiline text and the logic worked fine. It seems to me that since $match was, by definition, found in the $text, it ought to be able to find an exact copy of it reliably!
Any ideas on how exactly I'm being wrong about this would be greatly appreciated.
In reply to Regex bafflement by rastoboy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |