I don't think you'll be needing HTML::Parser, your html-code isn't really clean enough for that, and you don't want to deal with all the nasty font-tags.
A few patterns should get the job done much faster.
I only tested with two reviews, here's what I got so far:
# find the place where the features start: $_ = <IN> until m/face=arial color=white/; # now read the dvd features FEATURE: while(<IN>) { last FEATURE if m/blcorn.jpg/ or m/td/; m/src="(.).jpg">( )*([^<]+)/ and print "$3: $1\n"; } while($_ = <IN>) { last if /<p>/; } $_ =~ s/.*<b>//; $_ =~ s/<.*//; print "Title: $_"; $_ = <IN>; s/^\s*//; s/<.*//; print $_; # that's the studio $_ = <IN>; s/^\s*//; $_ =~ s/Reviewed by: //; $_ =~ s|</font>||; $_ =~ s|</p>||; print $_; # that's the reviewer # now for the text of the review: REVIEW: while ( $_ = <IN> ) { last REVIEW if m/<table/ or m/center/; $r .= $_; } # now some magic to clean up the font-soup: $r =~ s|<font[^>]*>||g; $r =~ s|</font>||g; # now some magic to turn <br> <br> into <p>, $r =~ s|<b>\s*<br>|<br><b>|gs; # extra magic $r =~ s|<br>(\s*<br>)+|<p>|gs; # now we can recognize the headlines, and turn them into <h2> $r =~ s|<p>\s*<b>\s*([^<]+)\s*</b>\s*<br>|<h2>$1</h2>|gs; print $r; close IN;
aargh, data munging at it's dirtiest. You should probably read davorgs book on Data Munging with Perl, and learn how to do this in a more organized, controlable way.
P.S. if you were serious about giving away dvds:
Brigitte Jellinek Horus IT GmbH Jakob Haringer Str. 8 5020 Salzburg Austria EUROPE
In reply to Re: Find and replace
by Anonymous Monk
in thread Find and replace
by dvdauthority
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |