Those tricks are ok for obscuring text-only addresses, but they are useless for real links.
GD, Image::Magick, Gimp, and others can handle text-to-image conversion. HTML::Entities will do the other. The opposition can use that module, too.
| [reply] |
Quick and dirty way to do the trick using GD:
#!/usr/bin/perl -w
use GD;
use GD::Text;
my $gd_text = GD::Text->new() or die GD::Text::error();
$gd_text->set_font(gdSmallFont);
$gd_text->set_text("nobody\@perlmonks.org");
my ($w, $h) = $gd_text->get('width', 'height');
my $gd = GD::Image->new($w,$h);
$white = $gd->colorAllocate(255,255,255);
$black = $gd->colorAllocate(0,0,0);
$gd->transparent($white);
$gd->string(gdSmallFont, 0, 0, "nobody\@perlmonks.org",$black);
binmode STDOUT;
print $gd->png;
perl -l -e "eval pack('h*','072796e647028222d2d202a0e49636f6c61637022292b3');"
| [reply] [d/l] |
You need to add two things to this graphic:
at the html page...
<img border="0" src="http://localhost/cgi-bin/grafem@il.cgi" alt="this
+ is my email" width="144" height="13">
to call the script.
And at the end of the script...
print "Content-type:image/png\n\n";
print $gd->png;
To fit an image into what was expected (see the content-type). | [reply] [d/l] [select] |
No, that's not right. You can safely convert your characters in your links to HTML entities, your browser has to recognize it — in fact, in attributes, the ampersand actually has to be encoded, though in many cases, you can get away without it. But hopefully, email harvesters are too stupid to recognize it. For now.
For example, that's what I did in the contact info field this node. If there wasn't a limit on the contact info field on this site, it would have been an actual "mailto" link.
Here's a little snippet to convert the text into numerical entities, ready for pasting into your HTML pages:
print map "&#$_;", unpack "C*", $text;
| [reply] [d/l] |
You've got to ask yourself "Why am I considering this, and what's the potential cost?"
I think that merlyn would assert that the practice of placing text in graphic images for the purpose of thwarting harvesters is a deprecated one, even though he himself was one of the pioneers in the process that makes it possible.
The issue is that such a practice makes the site inaccessable to a portion of its potential user base. First, obviously so to those with text-only browsers, but even more importantly, to those with visual impairments.
I think that a more robust solution for hiding an email address is to bury it in the CGI script or a (possibly encrypted) configuration file. There is no reason that the "Click here to send a message to the site administrator" needs to display the actual email address to which such a message is sent. Output to the browser only what the browser needs. If it's sufficient to have a "send me a message" button, don't bother placing the email address in public view at all. The script that processes the form can know the address through other means. It's not safe to have the script send an email message based on retrieving the address through GET or POST anyway, even if the address is stored in a hidden field, because a potential spammer could use his own pseudo-browser to send his own list of email addresses in place of yours to your script, thus using you as a spam gateway.
Just a few tidbits of food for thought...
Dave
"If I had my life to do over again, I'd be a plumber." -- Albert Einstein
| [reply] |
And another tip I just recently learned was to 'encode' the html email link by using HTML Ascii encodings (like A or some such).. Does anybody have some handy code to perform some translations in perl?
This part you can do with HTML::Entities.
As for the auto-image-generation stuff, I've never
done that, so some other monk will have to answer
that part. Though, if it were me, I'd just do an
image for each major TLD and use the entity encoding
for the rest of the address.
my %tldgif =
(
'.com' => ['commercetld.gif', $comwidth],
'.org' => ['organisationtld.gif', $orgwidth],
'.net' => ['telcotld.gif', $netwidth],
'.us' => ['usatld.gif', $uswidth],
'.uk' => ['gbtld.gif', $ukwidth],
'.au' => ['aussietld.gif', $auwidth],
);
sub encode_email_address {
my ($addy) = @_; my $tld = "";
for (keys %tldgif) {
if ($addy =~ /$_$/) {
$tld = "<img src=\"$tldgif{$_}[0]\"
alt=\"[$tldgif{$_}[0]]\"
height=\"$tldheight\"
width=\"$tldgif{$_}[1]\"";
$addy =~ s/$_$//;
}
}
use HTML::Entities;
my $charstoencode = join '',
('A'..'Z','a'..'z','<>"&@',0..9);
return (encode_entities($input, $charstoencode).$tld);
}
That's untested, and I get tired in the evening, so it
may need a slight fixup or two, but it should give
you the idea. Once again, if you want to generate
the images on the fly for each address, you'll need
to wait for another monk to answer, who has more
experience with such things.
As Zaxo says, spammers can easily use HTML::Entities
if they decide it's worth their while, which they
will if all the email addresses on the web are
encoded with entities. The image helps more, because
OCR is imperfect and takes more time than entity
decoding; there are no known documented cases yet
of spammers using OCR to harvest email addresses
from images. This doesn't mean they can't or won't,
or even that they haven't, but it's not currently a
common practice. There are other tricks, like putting
swirley colors in the background of the image and
stretching and distorting the text, to make it harder
for OCR software to read, hopefully without making it
too hard for a seeing person. If you want to pursue
that sort of shenaneghans, do a web search for
CAPTCHA.
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
| [reply] [d/l] [select] |
As davido pointed out, there are better ways to obfuscate e-mail addresses than using images. My personal favorite is to add some normal anti-spam noise ("NOSPAM" etc.) to the address and then use CSS to hide it from normal browsers.
Something like this will do the trick:
somebody@<span class=hidden>#NOSPAM#</span>example.com
where the class "hidden" is defined as:
.hidden { display: none; visibility: hidden; }
Any CSS-capable browser (and most are, these days) will hide the noise. If the user is viewing the page in a non-CSS browser, or is cutting and pasting the address, it should be obvious how to de-munge it. But a spambot probably won't even recognize it as an e-mail address. (They usually seem to look for patterns like /[-.\w]+@[-.\w]+\.$tld/, where $tld is a list of top-level domains.)
One important thing to remember is that you should not copy the example exactly. Change the name of the class, change the bogus string, add other <span> tags so the spambots can't filter out those, etc. Remember, diversity is the best defence against parasites of all kinds. | [reply] [d/l] [select] |
I like hairy addresses as a concept.
The idea is to nest parens in an embedded comment. Good emailers (ie anything that is standards-compliant) will recognize that the addresses are valid and accept them. Regular expression engines will not.
However it I have heard that Exchange is not a good emailer. Whether this is a feature or a bug for the technique is a matter of personal perspective... | [reply] |