I don't think you'll be needing HTML::Parser, your html-code isn't really clean enough for that, and you don't want to deal with all the nasty font-tags.

A few patterns should get the job done much faster.

I only tested with two reviews, here's what I got so far:

# find the place where the features start: $_ = <IN> until m/face=arial color=white/; # now read the dvd features FEATURE: while(<IN>) { last FEATURE if m/blcorn.jpg/ or m/td/; m/src="(.).jpg">(&nbsp;)*([^<]+)/ and print "$3: $1\n"; } while($_ = <IN>) { last if /<p>/; } $_ =~ s/.*<b>//; $_ =~ s/<.*//; print "Title: $_"; $_ = <IN>; s/^\s*//; s/<.*//; print $_; # that's the studio $_ = <IN>; s/^\s*//; $_ =~ s/Reviewed by: //; $_ =~ s|</font>||; $_ =~ s|</p>||; print $_; # that's the reviewer # now for the text of the review: REVIEW: while ( $_ = <IN> ) { last REVIEW if m/<table/ or m/center/; $r .= $_; } # now some magic to clean up the font-soup: $r =~ s|<font[^>]*>||g; $r =~ s|</font>||g; # now some magic to turn <br> <br> into <p>, $r =~ s|<b>\s*<br>|<br><b>|gs; # extra magic $r =~ s|<br>(\s*<br>)+|<p>|gs; # now we can recognize the headlines, and turn them into <h2> $r =~ s|<p>\s*<b>\s*([^<]+)\s*</b>\s*<br>|<h2>$1</h2>|gs; print $r; close IN;

aargh, data munging at it's dirtiest. You should probably read davorgs book on Data Munging with Perl, and learn how to do this in a more organized, controlable way.

P.S. if you were serious about giving away dvds:

Brigitte Jellinek
Horus IT GmbH
Jakob Haringer Str. 8
5020 Salzburg
Austria
EUROPE

In reply to Re: Find and replace by Anonymous Monk
in thread Find and replace by dvdauthority

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.