I have made a number of changes to your code. You were declaring @urls twice. You get a list of objects from the find_all_links method. You need to iterate over the list and get the URL of each object.
use strict;
use warnings;
use WWW::Mechanize;
my $agent = WWW::Mechanize->new();
my $regex = 'results';
my $url ='http://www.google.co.uk/search?hl=en&safe=off&q=intitle%3A%2
+2football+scores%22+inurl%3Aresults&meta=cr%3DcountryUK%7CcountryGB';
$agent->get($url);
$agent->agent_alias( 'Windows IE 6' );
my @urls = $agent->find_all_links();
foreach my $link(@urls) {
print "Link: ", $link->url(), "\n";
}
# print out the content of the current page for debugging
print $agent->response()->as_string();
| [reply] [d/l] |
After reviewing the CPAN Perldoc, it looks as though it most certainly should be returning a list back from the method, therefore you should be using it in the correct context. However, it does also note:
The method for specifying link criteria is the same as in find_link().
And, when reviewing the find_link() method, it looks as though it's implemented such as:
my $agent = WWW::Mechanize->new();
my $link = $agent->find_link( text => "some text", url_regex => qr/s
+omelink\.com/ );
Using this as an example, I would try re-factoring your code to read:
#!/usr/bin/perl
use strict;
use www::mechanize;
my $agent = WWW::Mechanize->new();
my @urls;
my $regex = 'results';
my $url ='http://www.google.co.uk/search?hl=en&safe=off&q=intitle%3A%2
+2football+scores%22+inurl%3Aresults&meta=cr%3DcountryUK%7CcountryGB';
$agent->get($url);
my @urls=$agent->find_all_links(text => "some text", url_regex => qr
+/somelink\.com/);
print stdout "@urls\n";
For the parameters of the method, you may want to take a look at the Perldoc for the class you are trying to use. There are a number of parameters you can supply, though they must be supplied as defined. Therefore, in your example, it would probably only search for any links that look like 'http://www.google.co.uk/search?hl=en&safe=off&q=intitle%3A%2
+2football+scores%22+inurl%3Aresults&meta=cr%3DcountryUK%7CcountryGB
', which would probably be none in the results.
One last (I swear) thing you might want to take a look at is the Google API. Although it is a SOAP based API, it is somewhat easy to use and there are already Perl modules in CPAN that act as wrappers for the API.
Also, maybe take a look into WWW::Scraper::Google and DBD::Google. Personally, I have not used them nor looked at the source to see what they are doing. However, if stable, they are most likely doing the same thing you are trying to accomplish.
Good Luck!
---hA||ta----
print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );
| [reply] [d/l] [select] |
Some odd things about your program: You have use www::mechanize, instead of WWW::Mechanize. Your print to stdout - but that's an unopened filehandle. Perhaps you mean STDOUT, but then you can leave it off. Your code isn't using any hashes, or hash references.
Also, you aren't checking the result of your get. Perhaps it failed.
It may help if you give us some real code that has the unwanted behaviour, and give us the real error messages.
| [reply] |
Is it possible that the perl is seeing
my $url ='http://www.google.co.uk/search?hl=en&safe=off&q=intitle%3A%2
+2football+scores%22+inurl%3Aresults&meta=cr%3DcountryUK%7CcountryGB';
and is looking at % as hashes? I am fairly new and was just wondering. | [reply] [d/l] |