in reply to Re^2: blocking site scrapers
in thread blocking site scrapers

Well, that certainly makes more sense than say, dynamically altering firewall rules (yes, I've seen that). :)

A well behaved search engine bot SHOULD be discernable by their UA (doubt the script kiddies bother to change theirs), and you may want to note whether a client requests or has requested /robots.txt...

Granted, none of this is a sure thing, but a combination of "tests" may get you close enough to what you want without restricting others...



--chargrill
$/ = q#(\w)# ; sub sig { print scalar reverse join ' ', @_ } + sig map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu +" );

Replies are listed 'Best First'.
Re^4: blocking site scrapers
by tirwhan (Abbot) on Feb 07, 2006 at 10:50 UTC

    What's wrong with dynamically altering firewall rules? Before answering you should perhaps consider that firewalls can be used for tarpitting (i.e. slowing down connections to the point of unusability) or rate-limiting individual addresses or address ranges, as well as simple blocking. In fact, if you have to resort to an IP-based policy (generally a bad idea), a well-implemented firewall solution is usually a better idea than server-side request mangling.

    To answer the OP's question, if you're on Linux you may want to look at the "recent" iptables extension. This article provides an introduction on how to use it. If you're on a different OS, have a look at that OS's firewall documentation.


    All dogma is stupid.

      /me humbly searches through his own httpd.conf and finds

      SetEnvIf Request_URI "winnt/system32/cmd\.exe" worm # etc ... CustomLog "|exec sh" "/sbin/route -nq add -host %a 127.0.0.1 -blackhol +e" env=worm

      ... so I guess to answer your question, the answer is that nothing is wrong with it per se. This was a somewhat popular method to block nimda, code red, sadmind, etc from doing too much damage to web servers a few years ago. More can be read here: log monitors and here: securityfocus. These links even suggest that indeed local or upstream firewalling would be more efficient.



      --chargrill
      $/ = q#(\w)# ; sub sig { print scalar reverse join ' ', @_ } + sig map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu +" );