Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I've been trying to work up my game with web automation and found the following resource: http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod, where the cpan search routine was the one I tried to model my efforts after. I've not been able to get it to work yet and would like to do so for the sake of thoroughness. Here is the script in question before any alteration:

#!/usr/bin/perl # turn on perl's safety features use strict; use warnings; # work out the name of the module we're looking for my $module_name = $ARGV[0] or die "Must specify module name on command line"; # create a new browser use WWW::Mechanize; my $browser = WWW::Mechanize->new(); # tell it to get the main page $browser->get("http://search.cpan.org/"); # okay, fill in the box with the name of the # module we want to look up $browser->form_number(1); $browser->field("query", $module_name); $browser->click(); # click on the link that matches the module name $browser->follow_link( text_regex => $module_name ); my $url = $browser->uri; # launch a browser... system('galeon', $url); exit(0);

I started by rebuilding it up from nothing, dumping forms, checking returns, and arrived with an intermediate version that gives proper results. The system call reflects my own windows-based platform:

C:\cygwin64\home\Fred\pages2\hunt>perl cpan4.pl WWW::Mechanize module name is WWW::Mechanize url is http://search.cpan.org/search?query=WWW%3A%3AMechanize&mode=all C:\cygwin64\home\Fred\pages2\hunt>type cpan4.pl #! /usr/bin/perl use warnings; use strict; use 5.010; # work out the name of the module we're looking for my $module_name = $ARGV[0] or die "Must specify module name on command line"; say "module name is $module_name"; # create a new browser use WWW::Mechanize; my $browser = WWW::Mechanize->new(); # tell it to get the main page $browser->get("http://search.cpan.org/"); # okay, fill in the box with the name of the # module we want to look up $browser->form_number(1); $browser->field( "query", $module_name ); $browser->click(); my $url = $browser->uri; say "url is $url"; # launch a browser... system( 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe', + $url ); exit(0);

The module name is well-formed; the search is well-formed; the output correctly displays in a browser a list of all modules where WWW::Mechanize was number one. There is but one line to add, and it's where it careens off the rails. This is the terminal output:

C:\cygwin64\home\Fred\pages2\hunt>perl cpan1.pl WWW::Mechanize WWW::Mechanize passed as text_regex is not a regex at cpan1.pl line 24 +. url is http://st.pimg.net/tucs/style.css?3 C:\cygwin64\home\Fred\pages2\hunt>perl cpan1.pl WWW::Mechan WWW::Mechan passed as text_regex is not a regex at cpan1.pl line 24. url is http://st.pimg.net/tucs/style.css?3 C:\cygwin64\home\Fred\pages2\hunt>perl cpan1.pl 'WWW::Mechanize' 'WWW::Mechanize' passed as text_regex is not a regex at cpan1.pl line +24. url is http://st.pimg.net/tucs/style.css?3 C:\cygwin64\home\Fred\pages2\hunt>perl cpan1.pl "WWW::Mechanize" WWW::Mechanize passed as text_regex is not a regex at cpan1.pl line 24 +. url is http://st.pimg.net/tucs/style.css?3 C:\cygwin64\home\Fred\pages2\hunt>

No matter what I type for the module name, I am told that same thing, and the browser gets opened to the .css page, which I find puzzling indeed. Line 24 is the one with the follow_link command:

#! /usr/bin/perl use warnings; use strict; use 5.010; # work out the name of the module we're looking for my $module_name = $ARGV[0] or die "Must specify module name on command line"; # create a new browser use WWW::Mechanize; my $browser = WWW::Mechanize->new(); # tell it to get the main page $browser->get("http://search.cpan.org/"); # okay, fill in the box with the name of the # module we want to look up $browser->form_number(1); $browser->field( "query", $module_name ); $browser->click(); # click on the link that matches the module name $browser->follow_link( text_regex => $module_name ); my $url = $browser->uri; say "url is $url"; # launch a browser... system( 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe', + $url ); exit(0);

So there it is. Thanks for your comment.

Replies are listed 'Best First'.
Re: Using example script correctly for opening cpan module
by 1nickt (Canon) on Sep 08, 2015 at 02:47 UTC

    Edit: Did some testing, and indeed quoting the regexp seems to make Mech happy. This seems to work:

    my $link = $browser->find_link( text_regex => qr{$module_name} ); $browser->follow_link( url => $link->url );
    or, more concisely:
    $browser->follow_link( url => $browser->find_link(text_regex => qr{$mo +dule_name})->url );
    Cheers!

    earlier ... Hmm, the string 'WWW::Mechanize' works as a pattern in a hand-rolled regexp . . .

    The docs for WWW::Mechanize only mention text_regex as an attribute of find_link, not follow_link, and also that a regexp passed to follow_link is matched against the URL, not the display text.

    Also, maybe it will help to quote your regexp? Try:

    $browser->follow_link( url => $browser->find_link( text_regex => $modu +le_name )->url );
    or maybe
    $browser->follow_link( text_regex => qr{$module_name} );
    Hope this helps!

    The way forward always starts with a minimal test.

      Thank you 1nickt, I'm relieved that this did indeed need a tweak. Your suggestion works perfectly for the case that ARGV[0] actually matches some link. I made a more verbose example to show what happens when one letter is off. What seems counter-intuitive about this was what constituted success, and the output gave me half an idea why the original script was sending me down the rabbit hole of the .css link. It is the first link, and I can well imagine that default behavior goes that way when the regex fails. This is a bit verbose, so I'll use readmore tags:

      It seems to me that the last place you want to be is stuck on the command line when you can't remember every proper keystroke in a module. I submit that testing whether $link is defined is worthwhile, and that one should get a window in either case. What's more, I found out that the case for failure in defining $link nevertheless turned up ranked, good alternatives. Furthermore, the 'mode' should be set to 'module' so that the behavior reflects the advertising. Now, I get no warnings or errors on the terminal, and ends up with a browser window that matched the regex or a ranked list of links that got close:

      C:\cygwin64\home\Fred\pages2\hunt>perl cpan10.pl HTML::Display C:\cygwin64\home\Fred\pages2\hunt>perl cpan10.pl HTML::Displayz C:\cygwin64\home\Fred\pages2\hunt>type cpan10.pl #! /usr/bin/perl use warnings; use strict; # work out the name of the module we're looking for my $module_name = $ARGV[0] or die "Must specify module name on command line"; # create a new browser use WWW::Mechanize; my $browser = WWW::Mechanize->new(); # tell it to get the main page $browser->get("http://search.cpan.org/"); # okay, fill in the box with the name of the # module we want to look up $browser->form_number(1); $browser->field( "query", $module_name ); $browser->click(); my $link = $browser->find_link( text_regex => qr{$module_name} ); # make sure $link is defined if ( defined $link ) { my $success2 = $browser->follow_link( url => $link->url ); my $url = $browser->uri; system( 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe +', $url ); } else { $browser->back; $browser->submit_form( form_number => 1, fields => { query => $module_name, mode => 'module', }, ); my $url = $browser->uri; system( 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe +', $url ); }

      Anyways, that's what I can figure out tonight. Web automation is truly fantastic.