Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: perl regex or module that identifies bots/crawlers

by shigetsu (Hermit)
on Mar 20, 2007 at 19:18 UTC ( [id://605732]=note: print w/replies, xml ) Need Help??


in reply to perl regex or module that identifies bots/crawlers

Perhaps HTTP::BrowserDetect's robot() method?

Replies are listed 'Best First'.
Re^2: perl regex or module that identifies bots/crawlers
by argv (Pilgrim) on Mar 20, 2007 at 21:56 UTC
    Perhaps HTTP::BrowserDetect's robot() method?
    While I retain my enthusiasm for this module, and while it does precisely what I wanted it to do -- namely, to have a simplified/generic series of regex's that can determine whether a browser is a robot -- it suffers from a problem that plagues all who venture into this area: it's impossible to keep up with the robots. I've found numerous databases of known robot names, and all of them stipulate that none of these lists are complete. It is an unsolvable problem, which is the primary reason for the crypt glyphs you see on pages (that make you type something to prove you're a human). That said, the robot() method does a good enough job for now, and certainly well worth not having had to spend more time dealing with this problem. Great bang for the buck. perlmonks rescued me once again...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://605732]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-03-29 01:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found