in reply to Re: proof of concept how to run this code
in thread proof of concept how to run this code

This node falls below the community's minimum standard of quality and will not be displayed.

Replies are listed 'Best First'.
Re^4: proof of concept how to run this code
by ikegami (Patriarch) on Aug 25, 2006 at 09:26 UTC
    The minimal change consists of changing
    my $url = "http://www.nukeforums.com/forums/viewforum.php?f=17"; my $ua = LWP::RobotUA->new; my $lp = HTML::LinkExtor->new(\&wanted_links); my @links; get_threads($url); foreach my $page (@links) { ... }
    to
    my $ua = LWP::RobotUA->new; my $lp = HTML::LinkExtor->new(\&wanted_links); my @links; foreach my $forum_id (17, 3) { my $url = "http://www.nukeforums.com/forums/viewforum.php?f=$forum +_id"; @links = (); # yuck! my $links = get_threads($url); foreach my $page (@$links) { ... } }

    As you can see, I don't like your use of the global variable @links. We're forced to provide and initialize a variable that should be local to get_threads. Here's the fix:

    #!/usr/bin/perl use strict; use warnings; use LWP::RobotUA; use HTML::LinkExtor; use HTML::TokeParser; use URI::URL; use Data::Dumper; # for show and troubleshooting my $ua = LWP::RobotUA->new(); foreach my $forum_id (17, 3) { my $url = "http://www.nukeforums.com/forums/viewforum.php?f=$forum +_id"; my $links = get_threads($url); foreach my $page (@$links) { ... } } sub get_thread { ... } sub get_threads { my $page = shift; my @links; my $lp = HTML::LinkExtor->new(sub { my($tag, %attr) = @_; return unless exists $attr{'href'}; return if $attr{'href'} !~ /^viewtopic\.php\?t=/; push @links, values %attr; }); my $request = HTTP::Request->new(GET => $url); my $response = $ua->request($request, sub {$lp->parse($_[0])}); # Expand URLs to absolute ones my $base = $response->base; return [ map { url($_, $base)->abs } @links ]; }

    Update: Added the minimal change.

    Edited by planetscape - Reparented from Reaped: LWP & HTML::LinkExtor running recursively against a bulletin board to Re^3: proof of concept how to run this code

    A reply falls below the community's threshold of quality. You may see it by logging in.