in reply to WWW:Mechanize bug?

the regex I've used only finds the first 2 links? Anybody any ideas on whats going on?

You're confused about how a browser works

you get first/url

you get list of links from first/url

first link takes you to second/url

second/url has no more links, especially not the links from first/url, so you can't follow

Either rewind the browser, or use get, not follow

Replies are listed 'Best First'.
Re^2: WWW:Mechanize bug?
by fraizerangus (Sexton) on Oct 13, 2011 at 19:45 UTC
    Monks

    Thanks so much for the help! I did get it working however it only seems to fetch the first 7 and then the error message appears:

    Internal Server Error at newp line 14

    Using the following code:

    #!/usr/bin/perl use strict; use WWW::Mechanize; use Storable; my $mech_cgi = WWW::Mechanize->new; $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ + ); for(my $i = 0; $i < @cgi_links; $i++) { print "following link: ", $cgi_links[$i]->url, "\n"; $mech_cgi->follow_link( url => $cgi_links[$i]->url ) or die "Error following link ", $cgi_links[$i]->url; $mech_cgi->back; }

    is this a fault with their server or my script?

    many thanks and best wishes

    Dan

      is this a fault with their server or my script?

      Can't say, that error message isn't very informative

      Try

      #!/usr/bin/perl -- use strict; use warnings; use WWW::Mechanize; my $mech_cgi = WWW::Mechanize->new ( autocheck => 1 ); $mech_cgi->show_progress(1); $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @Motion = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ ); @Motion = map { $_->url_abs() } @Motion; for my $link ( @Motion ){ eval { $mech_cgi->get( $link ); 1; } or warn $@, "\n", $mech_cgi->res->as_string, "\n", '#'x33, "\n\n +"; $mech_cgi->back; } __END__
      And you'll get something more informative
      ** GET http://www.molmovdb.org/cgi-bin/browse.cgi ==> 202 OK ... ** GET http://..../4040404 ==> 404 Not Found Error GETing http://..../4040404: Not Found at somefile.pl line 12 HTTP/1.1 404 Not Found Connection: close Date: Thu, 13 Oct 2011 23:01:51 GMT ... Content-Length: 3942 Content-Type: text/html Client-Date: Thu, 13 Oct 2011 23:05:18 GMT ... Title: blah blah blah <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> ....