Perl can't save the world (well, maybe...) but it can help stop spammers. I've written a generalized webpage and email faker: Spider Catcher. Simply, the tool generates faked web pages, based on a user defined template, containing peppered bogus emails. The trick is to get the script to link to itself under a different link, trapping a harvesting spider. To be fair to legitimate search engine spiders, the catcher can be modified in the future to respond differently to various user-agents.

It uses Markov chains and a babelizer to generate semi-coherent content from arbitrary input text. The code is downloadable on the page itself.

I can imagine this would make an impact on the spammers' databases if a large number of webmasters implemented this.

I got the idea for it from the IMDB's harvest pages.

Edit: chipmunk 2001-06-18

  • Comment on Spider-catcher: anti-spam measures for email harvesters

Replies are listed 'Best First'.
Re: Spider-catcher: anti-spam measures for email harvesters
by petdance (Parson) on Jun 13, 2001 at 16:22 UTC
    Here's another implementation that's been around a while: Wpoison.

    xoxo,
    Andy

    %_=split/;/,".;;n;u;e;ot;t;her;c; ".   #   Andy Lester
    'Perl ;@; a;a;j;m;er;y;t;p;n;d;s;o;'.  #   http://petdance.com
    "hack";print map delete$_{$_},split//,q<   andy@petdance.com   >
    
Re: Spider-catcher: anti-spam measures for email harvesters
by Anonymous Monk on Jun 13, 2001 at 11:21 UTC
Re: Spider-catcher: anti-spam measures for email harvesters
by John M. Dlugosz (Monsignor) on Jun 14, 2001 at 01:33 UTC
    Another anti-spam idea I'm thinking about using: Use more than one email address for newsgroup posts. Identical mail arriving at both addresses is "harvested" and tosses as spam.

    Combining that with a spider catcher idea, you could have a dummy page with a special email address on it, and anything going there is assumed to be harvested spam, and that can be used as a template for what to filter out of other mailboxes.

    —John