NCSA Uses Perl to Compare Google/Yahoo

I was happy to see our favorite language mentioned in a recent article disputing claims that Yahoo!'s index was apparently now double the size of Google's. Better yet was that they provided the code used to run the test. I didn't expect rocket science, they were simply running random queries at the two engines (basically doing the same as many scripts do to find a googlewhack).

I've got to say that although the code did apparently work properly, I was not all that impressed by the code that was used. Perhaps the author was not a native perl coder? I noticed a lot more duplication than I expected, and what I assume are leftover idioms from earlier perl days (srand calls) and some evil if statement logic that I can't explain away!

Either way, since this article is making the rounds, I thought some of my fellow monestarians may like to comment on the code.

ps - sorry if this is posted in the wrong place. Seemed a toss-up to me between here and Perl News

pps - Just noticed that there is another thread about this in CUFP here. Sorry for the dupe.

Moved from Meditations to Perl News by Arunbear.

Comment on NCSA Uses Perl to Compare Google/Yahoo

Replies are listed 'Best First'.
Re: NCSA Uses Perl to Compare Google/Yahoo by creamygoodness (Curate) on Aug 16, 2005 at 15:52 UTC
Interesting stuff! The project directly violates the prominently placed "No Automated Querying" directive in Google's terms of service: http://www.google.com/intl/en/terms_of_service.html, and it seems unlikely that they made an exception: Please do not write to Google to request permission to "meta-search" Google for a research project, as such requests will not be granted. It also looks like the spider doesn't sleep between requests. Presumably the author made a considered choice that because Google and Yahoo are so robust, it was acceptable to fire off a huge number of requests all at once, in contravention of standard spidering netiquette (Google and Yahoo don't do that to your server). If you ever write a spider, please don't do that. Perhaps the author was not a native perl coder? `sub main` ? No hashes? Lots of subscripted array elements? Could it be... C? -- Marvin Humphrey Rectangular Research ― http://www.rectangular.com	[reply] [d/l]

Replies are listed 'Best First'.

Re: NCSA Uses Perl to Compare Google/Yahoo
by creamygoodness (Curate) on Aug 16, 2005 at 15:52 UTC

http://www.google.com/intl/en/terms_of_service.html

Please do not write to Google to request permission to "meta-search" Google for a research project, as such requests will not be granted.

your

Perhaps the author was not a native perl coder?

sub main ? No hashes? Lots of subscripted array elements? Could it be... C?

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

[reply]
[d/l]