Google does not permit you to screen-scrape Google in this manner.
Please use the Google API, conveniently wrapped in Net::Google.
| [reply] [Watch: Dir/Any] |
| [reply] [Watch: Dir/Any] |
Oddly enough, the Google AJAX API FAQ lists 15 questions, but only contains 7 answers.
From a previous reading of the rules of use, you specifically were NOT to use it on anything other than a website, and you were not allowed to do anything other than present the information directly as returned by Google. ... unfortunately, answers #9 and #11 aren't listed right now. (of course ... would it then be ethical to scrape that website that you created?)
Update: 9 and 11, not 8 and 11.
| [reply] [Watch: Dir/Any] |
Is it truly legal for a site to tell you how to use content they provide on the internet? If he was planning to redistribute the info on his own website I could understand, but for a personal command line tool? Isn't that kind of like saying you have to READ the whole page of HTML we send you, you can't just skim it to see if your site worked?
I'm not saying it is ethical, I'm just curious as to how far googles reach extends over the content it provides. If it were a site I had to register and agree to it's terms of use i could understand that, but this is a case of limited the use of information that is made public by google on purpose. What if I made a GreaseMonkey script that does the same thing and displays it in my browser? Where does the line get drawn? Am I required to view their entire page of HTML based on terms of use that I might not know exist let alone agree to? Could there terms of use then ban me from using information off a search in any other context? Could it state that i must fully read at least one ad before looking to see if my site was among the other sites listed?
Like I said, I can understand limits on uses of information that you have to register to see or that you plan on reusing for your own profit, but this doesn't seem to fit either of those casses so I'm curious. Just some food for thought, and maybe there is an obvious answer out there that i'm not aware of.
| [reply] [Watch: Dir/Any] |
The while loop can be written simply as:
my @results = m@<a class=l href=\"(.*?)\">@gi;
Also, you may want to look at WWW::Mechanize's find_all_links method (returns WWW::Mechanize::Link objects) so that you're not parsing html yourself ..
my @results
= map { $_->url }
grep { $_->attrs->{class} eq '1' }
$mech->find_all_links(tag=>'a')
;
| [reply] [Watch: Dir/Any] [d/l] [select] |