in reply to Page Scraping

As much as I understood, you don't actually need to login to see the needed pages. Otherwise, this seems to do what you want to do (or something resembling it):

use strict; use warnings; use Fatal qw/ open close /; use WWW::Mechanize; use Carp; my $mech = WWW::Mechanize->new( autocheck => 1 ); open my $pro_list, '>>', 'profiles.out'; for my $curr_page (1..3) { $mech->get("http://indiecharts.com/indie_Music_Artists.asp?Key +word=&Page=$curr_page&butname="); my @artist_links = $mech->find_all_links(url_regex => qr/\d{9} +/); print scalar @artist_links, " matching links on page $curr_pag +e\n"; for my $artist_link (@artist_links) { print $pro_list $artist_link->text(), "\n"; } }

Replies are listed 'Best First'.
Re^2: Page Scraping
by 80degreez (Initiate) on May 01, 2007 at 21:14 UTC
    Akho, that kinda worked, but it retrieved the artist names instead of the butname= id #
      You could work this out yourself, but

      $artist_link->url() =~ /(\d{9})/; print $pro_list $1, "\n";

      should help.

      Ask if you don't understand some parts of my script.

        That doesn't catch cases when the regular expression doesn't match and hence $1 might have a value assigned to it from a previous successful capturing regular expression match.

        I humbly suggest either:

        print {$pro_list} $1, "\n" if $artist_link->url() =~ /(\d{9})/;
        or
        if ($artist_link->url() =~ /(\d{9})/) { print {$pro_list} $1, "\n"; }

        The curly braces around {$pro_list} disambiguates its use as the filehandle that is printed to.