comment on

I don't think you'll be needing HTML::Parser, your html-code isn't really clean enough for that, and you don't want to deal with all the nasty font-tags.

A few patterns should get the job done much faster.

I only tested with two reviews, here's what I got so far:

        # find the place where the features start:

        $_ = <IN> until m/face=arial color=white/;

        # now read the dvd features

        FEATURE:
        while(<IN>) {
                last FEATURE if m/blcorn.jpg/ or m/td/;

                m/src="(.).jpg">(&nbsp;)*([^<]+)/ 
                     and print "$3: $1\n";
        }

        while($_ = <IN>)  {
                last if /<p>/;
        }
        $_ =~ s/.*<b>//;
        $_ =~ s/<.*//;
        print "Title: $_";

        $_ = <IN>;
        s/^\s*//;
        s/<.*//;
        print $_;  # that's the studio

        $_ = <IN>;
        s/^\s*//;
        $_ =~ s/Reviewed by: //;
        $_ =~ s|</font>||;
        $_ =~ s|</p>||;
        print $_;  # that's the reviewer

        # now for the text of the review:
        REVIEW:
        while ( $_ = <IN> ) {
                last REVIEW if m/<table/ or m/center/;
                $r .= $_;
        }

        # now some magic to clean up the font-soup:

        $r =~ s|<font[^>]*>||g;
        $r =~ s|</font>||g;

        # now some magic to turn <br> <br> into <p>,

        $r =~ s|<b>\s*<br>|<br><b>|gs;   # extra magic
        $r =~ s|<br>(\s*<br>)+|<p>|gs;

        # now we can recognize the headlines, and turn them into <h2>

        $r =~ s|<p>\s*<b>\s*([^<]+)\s*</b>\s*<br>|<h2>$1</h2>|gs;

        print $r;
        close IN;
[download]

aargh, data munging at it's dirtiest. You should probably read davorgs book on Data Munging with Perl, and learn how to do this in a more organized, controlable way.

P.S. if you were serious about giving away dvds:

Brigitte Jellinek
Horus IT GmbH
Jakob Haringer Str. 8
5020 Salzburg
Austria
EUROPE

In reply to Re: Find and replace by Anonymous Monk
in thread Find and replace by dvdauthority

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.