New Novice has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am trying to automatically retrieve information from on online database. So far I fill in the forms and submit them using WWW::Mechanize. The returns are on several pages. And that's where the problems start: I cannot navigate between them to look at the other pages (and download them as well). Thus, so far I only get the first of up to twenty pages of results.

The database search engine can be found at http://europa.eu.int/prelex/rech_avancee.cfm?CL=en. After filling in series (COM) and year (e.g., 1999) the form is submitted and the first of twenty pages of results is displayed. I save this page and would like to go to the next one, however, I do not know how to manipulate this kind of navigation bar (the 1-20 fields). Their html code looks like this:

<A HREF="liste_resultats.cfm?PCP=1&CL=en" ONMOUSEOUT="isimgact( 'btn_n +av_pin52', 'btn_nav_pinoff')" ONMOUSEOVER="isimgact( 'btn_nav_pin52', + 'btn_nav_pinon')"><IMG src="images/btn_pin.gif" BORDER="0" HEIGHT="1 +7" WIDTH="18" NAME="btn_nav_pin52" ALT="COM (1976) 728 - COM (1976) 6 +97-3"></A></td>
I tried to go to the href-link directly, but then only an empty form is displayed.

Here is my code:

#!/usr/bin/perl -w use strict; use WWW::Mechanize; use LWP::Simple; my $agent = WWW::Mechanize->new(); $agent->get("http://europa.eu.int/prelex/rech_avancee.cfm?CL=en"); $agent->form(2); $agent->field("clef2", "1999"); $agent->field("clef1", 'COM'); $agent->field("nbr_element", '99'); $agent->click(); my @pcp=(1, 100, 199, 298, 397, 496, 595, 694, 793, 892, 991, 1090, 11 +89, 1288, 1387, 1486, 1585, 1684, 1783, 1882); my $pcp; foreach $pcp (@pcp) { my @input; @input=get("http://europa.eu.int/prelex/liste_resultats.cfm?PCP=$pc +p\&CL=en"); my $input; foreach $input (@input) { open RESULTS, ">>C:/programme/perl/test/result.txt"; print RESULTS "$input\n"; close(RESULTS); } }
Help would be greatly appreciated. I looked at the descriptions for MECHANIZE, but they only mention "regular" buttons.

Replies are listed 'Best First'.
Re: WWW::Mechanize and Navigation
by Corion (Patriarch) on Nov 25, 2004 at 16:49 UTC

    Why are you mixing LWP::Simple calls and WWW::Mechanize calls? While both retrieve pages from the web, they cannot easily be mixed?

    Here is a way to achieve what you want via WWW::Mechanize through the links() method of the agent, which returns you the list of links on the page, out of which you then can select the links you want:

    # After you've submitted your query: my @links = grep { $_->uri =~ m!liste_resultats! } $agent->links; foreach my $link (@links) { print "Retrieving $link\n"; $agent->follow( $link ); print $agent->content; $agent->back; };
      Thanks for this! Looks like an elegant solution!

      Unfortunately, I can't quite get it to work. Do you by any chance know, where I could find more information about the links() method? CPAN does not list the find_links method.

      Here is the code, I unsuccesfully tried, in case you are interested.

      #!/usr/bin/perl -w use strict; use WWW::Mechanize; our $count=1; our $year=1976; while ($year<1977) { my $input; my $agent = WWW::Mechanize->new(); $agent->get("http://europa.eu.int/prelex/rech_avancee.cfm?CL=en"); $agent->form(2); $agent->field("clef2", "$year"); $agent->field("clef1", 'COM'); $agent->field("nbr_element", '99'); $agent->click(); $input=$agent->content(); my @pcplinks = grep { $_->url =~ m!liste_resultats.cfm! } $agent->link +s; print @pcplinks; my $filecount; $filecount=0; foreach my $pcplink (@pcplinks) { my $input2; print "Retrieving $pcplink\n"; $agent->follow( $pcplink ); $filecount++; $input2=$agent->content(); $agent->back; } }

        You can read the WWW::Mechanize documentation either from your console window by typing perldoc WWW::Mechanize, or by looking at the documentation via http://search.cpan.org, here. This page documents the version 1.04 - you should upgrade to this version if you have a much lower version.

Re: WWW::Mechanize and Navigation
by Anonymous Monk on Nov 26, 2004 at 20:26 UTC
    You must execute the javascript:isimgact method first. This method should modify the target of the link on mouseOver.