Email anti-harvester code

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Email anti-harvester code by Zaxo (Archbishop) on Oct 04, 2003 at 01:41 UTC
Those tricks are ok for obscuring text-only addresses, but they are useless for real links. GD, Image::Magick, Gimp, and others can handle text-to-image conversion. HTML::Entities will do the other. The opposition can use that module, too. After Compline, Zaxo	[reply]
Re: Re: Email anti-harvester code by cpc (Hermit) on Oct 04, 2003 at 05:02 UTC
Quick and dirty way to do the trick using GD: `#!/usr/bin/perl -w use GD; use GD::Text; my $gd_text = GD::Text->new() or die GD::Text::error(); $gd_text->set_font(gdSmallFont); $gd_text->set_text("nobody\@perlmonks.org"); my ($w, $h) = $gd_text->get('width', 'height'); my $gd = GD::Image->new($w,$h); $white = $gd->colorAllocate(255,255,255); $black = $gd->colorAllocate(0,0,0); $gd->transparent($white); $gd->string(gdSmallFont, 0, 0, "nobody\@perlmonks.org",$black); binmode STDOUT; print $gd->png;` [download] perl -l -e "eval pack('h*','072796e647028222d2d202a0e49636f6c61637022292b3');"	[reply] [d/l]
Re: Re: Re: Email anti-harvester code by chanio (Priest) on Oct 05, 2003 at 02:49 UTC
You need to add two things to this graphic: at the html page... `<img border="0" src="http://localhost/cgi-bin/grafem@il.cgi" alt="this + is my email" width="144" height="13">` [download] to call the script. And at the end of the script... `print "Content-type:image/png\n\n"; print $gd->png;` [download] To fit an image into what was expected (see the content-type).	[reply] [d/l] [select]
Re: Re: Email anti-harvester code by bart (Canon) on Oct 04, 2003 at 09:18 UTC
No, that's not right. You can safely convert your characters in your links to HTML entities, your browser has to recognize it — in fact, in attributes, the ampersand actually has to be encoded, though in many cases, you can get away without it. But hopefully, email harvesters are too stupid to recognize it. For now. For example, that's what I did in the contact info field this node. If there wasn't a limit on the contact info field on this site, it would have been an actual "mailto" link. Here's a little snippet to convert the text into numerical entities, ready for pasting into your HTML pages: `print map "&#$_;", unpack "C*", $text;` [download]	[reply] [d/l]
Re: Email anti-harvester code by davido (Cardinal) on Oct 04, 2003 at 06:06 UTC
You've got to ask yourself "Why am I considering this, and what's the potential cost?" I think that merlyn would assert that the practice of placing text in graphic images for the purpose of thwarting harvesters is a deprecated one, even though he himself was one of the pioneers in the process that makes it possible. The issue is that such a practice makes the site inaccessable to a portion of its potential user base. First, obviously so to those with text-only browsers, but even more importantly, to those with visual impairments. I think that a more robust solution for hiding an email address is to bury it in the CGI script or a (possibly encrypted) configuration file. There is no reason that the "Click here to send a message to the site administrator" needs to display the actual email address to which such a message is sent. Output to the browser only what the browser needs. If it's sufficient to have a "send me a message" button, don't bother placing the email address in public view at all. The script that processes the form can know the address through other means. It's not safe to have the script send an email message based on retrieving the address through GET or POST anyway, even if the address is stored in a hidden field, because a potential spammer could use his own pseudo-browser to send his own list of email addresses in place of yours to your script, thus using you as a spam gateway. Just a few tidbits of food for thought... Dave "If I had my life to do over again, I'd be a plumber." -- Albert Einstein	[reply]
Re: Email anti-harvester code by jonadab (Parson) on Oct 04, 2003 at 01:44 UTC
And another tip I just recently learned was to 'encode' the html email link by using HTML Ascii encodings (like `A` or some such).. Does anybody have some handy code to perform some translations in perl? This part you can do with HTML::Entities. As for the auto-image-generation stuff, I've never done that, so some other monk will have to answer that part. Though, if it were me, I'd just do an image for each major TLD and use the entity encoding for the rest of the address. my %tldgif = ( '.com' => ['commercetld.gif', $comwidth], '.org' => ['organisationtld.gif', $orgwidth], '.net' => ['telcotld.gif', $netwidth], '.us' => ['usatld.gif', $uswidth], '.uk' => ['gbtld.gif', $ukwidth], '.au' => ['aussietld.gif', $auwidth], ); sub encode_email_address { my ($addy) = @_; my $tld = ""; for (keys %tldgif) { if ($addy =~ /$_$/) { $tld = "<img src=\"$tldgif{$_}[0]\" alt=\"[$tldgif{$_}[0]]\" height=\"$tldheight\" width=\"$tldgif{$_}[1]\""; $addy =~ s/$_$//; } } use HTML::Entities; my $charstoencode = join '', ('A'..'Z','a'..'z','<>"&@',0..9); return (encode_entities($input, $charstoencode).$tld); } [download] That's untested, and I get tired in the evening, so it may need a slight fixup or two, but it should give you the idea. Once again, if you want to generate the images on the fly for each address, you'll need to wait for another monk to answer, who has more experience with such things. As Zaxo says, spammers can easily use HTML::Entities if they decide it's worth their while, which they will if all the email addresses on the web are encoded with entities. The image helps more, because OCR is imperfect and takes more time than entity decoding; there are no known documented cases yet of spammers using OCR to harvest email addresses from images. This doesn't mean they can't or won't, or even that they haven't, but it's not currently a common practice. There are other tricks, like putting swirley colors in the background of the image and stretching and distorting the text, to make it harder for OCR software to read, hopefully without making it too hard for a seeing person. If you want to pursue that sort of shenaneghans, do a web search for CAPTCHA. `$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/` [download]	[reply] [d/l] [select]
Re: Email anti-harvester code by iltzu (Initiate) on Oct 05, 2003 at 16:39 UTC
As davido pointed out, there are better ways to obfuscate e-mail addresses than using images. My personal favorite is to add some normal anti-spam noise ("NOSPAM" etc.) to the address and then use CSS to hide it from normal browsers. Something like this will do the trick: `somebody@<span class=hidden>#NOSPAM#</span>example.com` [download] where the class "hidden" is defined as: `.hidden { display: none; visibility: hidden; }` [download] Any CSS-capable browser (and most are, these days) will hide the noise. If the user is viewing the page in a non-CSS browser, or is cutting and pasting the address, it should be obvious how to de-munge it. But a spambot probably won't even recognize it as an e-mail address. (They usually seem to look for patterns like `/[-.\w]+@[-.\w]+\.$tld/`, where `$tld` is a list of top-level domains.) One important thing to remember is that you should not copy the example exactly. Change the name of the class, change the bogus string, add other <span> tags so the spambots can't filter out those, etc. Remember, diversity is the best defence against parasites of all kinds.	[reply] [d/l] [select]
Re: Email anti-harvester code by tilly (Archbishop) on Oct 05, 2003 at 18:50 UTC
I like hairy addresses as a concept. The idea is to nest parens in an embedded comment. Good emailers (ie anything that is standards-compliant) will recognize that the addresses are valid and accept them. Regular expression engines will not. However it I have heard that Exchange is not a good emailer. Whether this is a feature or a bug for the technique is a matter of personal perspective...	[reply]