Re: Newb wrestles with join

<rant>
It's better not to try to parse html (and xml) by using regular expressions. I know it's tempting to do so and it also often enough just works for quick and dirty scripts. But if the program has to scale you will soon find it in a big mess.
</rant>

So it's better to use an appropriate module like HTML::Parser or HTML::TokeParser.


use strict;
use LWP::Simple;

my $url = "http://amazon.com/o/tg/cm/browse-communities/-/" .
          $circleID . "/t/";

#Request the URL
my $content = get($url);
die "Could not retrieve $url" unless $content;

 use HTML::TokeParser;
 use Data::Dumper;

 $p = HTML::TokeParser->new(\$content) ||
      die "Can't open: $!";

 while (my $token = $p->get_tag( 'title' )) 
 {
     print Dumper ($token);
     #...
 }
[download]

holli, /regexed monk/

Comment on Re: Newb wrestles with join Download Code

Replies are listed 'Best First'.
Re^2: Newb wrestles with join by Dizzley (Novice) on Oct 01, 2005 at 06:58 UTC
Perfect. I'm heading down the HTML::TokeParser road right now. Perl Monks do it again. Thanks very much, Diz.	[reply]