As most of us know, the best SPAM prevention measure is to not publish email addresses on public web pages. While this works, there are times when you want (or a client wants you) to publish one or more email addresses.

This article discusses one way you can do so without increasing your SPAM burden. The following script provides a very simple implementation of that technique.

Update: Added Ovid's suggestion and clarified the Two-Face function. Also...put into practice.

#!/usr/bin/perl -wT # --------------------------------------------------------- # Converts email addresses to a format suitable for mailto: # links, one that is reportedly more difficult for spammers # to harvest. # # Note: mailto: tags have other problems. Feedback scripts # may be more effective defenses; however, this can be used # by your HTML "designers" while you're working on a better # solution (Hint: combine CGI.pm with MIME::Lite). # # Revision log and contact info provided in __DATA__ Share # and Enjoy...so long as I get credit, too. # --------------------------------------------------------- use strict; use CGI qw( :standard ); $CGI::HEADERS_ONCE = 1; $CGI::POST_MAX = 1_024; $CGI::DISABLE_UPLOADS = 1; #use CGI::Carp qw( fatalsToBrowser ); #use CGI::Pretty; use HTML::Entities; my $field = "emailaddr"; my $error = cgi_error; if ( $error ) { printForm( "There was a problem with your " . "request.\n\nDetails: $error." ); } elsif ( param( $field ) ) { encodeEmail( param( $field ) ); } else { printForm( "This encodes a simple email address " . '(user@example.com) to a MAILTO: tag ' . "that is more difficult for spammers " . "to harvest.\n\n" . "Please submit the email address you " . "want to encode:\n" ); } exit 1; sub coinFlip # ----------------------------------------------------- # Note that this is weighted heavily in favor of the # codes. # ----------------------------------------------------- { return ( rand() < 0.75 ) ? 1 : 0; } sub encodeEmail # ----------------------------------------------------- # Encodes an email address by randomly converting most # of its characters to ASCII codes. Note weighting in # coinFlip(); according to rumor, this works best when # most, but not all characters are encoded. # ----------------------------------------------------- { my $input = shift; my @chars = split //, $input; my @codes = map { encode_entities( $_, '\x00-\xff' ) } @chars; # See Camel book discussion of srand srand( time() ^ ( $$ + ( $$ << 15 ) ) ); my $result = $input; while ( $result eq $input ) # just in case. { $result = ""; foreach my $index ( 0..$#chars ) { $result .= coinFlip() ? $codes[ $index ] : $chars[ $index ] ; } } printForm( "Your encoded email address " . "is shown below:", $input, $result ); } sub printForm # ----------------------------------------------------- # Prints the HTML. The first parameter is explanatory # text displayed to the user. The second and third # parameters are optional and (respectively) contain # the email address entered by the user and its encoded # version. # ----------------------------------------------------- { my $text = shift; my $email = shift; my $mailto = shift; my $title = "Simple Email Address Encoder"; my $type = "application/x-www-form-urlencoded"; my $source = a( { href=>"http://webdeveloper.com/" . "drweb/19990329-drweb.html" }, "article" ); print header(), start_html( -title => $title ), h1( $title ), p( $text ), start_form( "post", url(), $type ); if ( $mailto ) { my $link = '<a href="mailto:' . "$mailto" . '">' . "$mailto</a>"; print p( "Your link appears like this: $link" ), p( "The value for the mailto link tag is:", br, textarea( -name => "encoded", -default => $mailto, -rows => 3, -columns => 58, -wrap => "virtual" ) ); } print hr, p( 'Enter an e-mail address to encode:', br, textfield( $field, $email, 60, 50 ) ), p( submit(), reset() ), end_form, p( "For more information, please see this ", $source, "." ), end_html; } __DATA__ Contact Info on my home node: http://perlmonks.com/index.pl?node_id=33117 To do list: -- Detect the slight risk of returning all codes and re-run should that happen. I've been told a mixed string is the most effective. -- Fix problems with multiple submissions. There are two, though I think they're from the same bug. -- Podify -- Look for further (reasonable) streamlining. Updates: v0.0.0, 19 Jun 01 - First posted to perlmonks.com v0.0.1, 19 Jun 01: -- Added in HTML::Entities, per [Ovid]. (Yay!) -- Revised coinFlip() and calling code to clarify intent (codes returned 75% of the time).

As always, feedback, meme-prevention, and idiom checks would be appreciated.

--f

Replies are listed 'Best First'.
(Ovid) Re: Simple Anti-Spam Device
by Ovid (Cardinal) on Jun 20, 2001 at 02:33 UTC

    Hey, that's a nice trick and I like it. Just one thing I'm wondering about: your &unpackEmail sub does HTML encoding of the characters. Why not have the line

    my @codes = unpackEmail( $input );

    Changed to:

    use HTML::Entities; my @codes = map{ encode_entities( $_, '\x00-\xff' ) } split//, $in +put;

    If I understood your code correctly, you could then drop your unpackEmail sub.

    Cheers,
    Ovid

    Update: I forgot to mention, if you want to get rid of all HTML, you can change this:

    my $link = '<a href="mailto:' . "$mailto" . '">' . "$mailto</a>"; # to this: my $link = a( { -href => $mailto }, $mailto );

    I can't wait to email the link to your code to a bunch of my friends (yes, even Ovid has friends). It's a really, really nice trick.

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Simple Anti-Spam Device
by miyagawa (Chaplain) on Jun 20, 2001 at 06:18 UTC
    See my Apache::AntiSpam on CPAN, if you're friendly w/ mod_perl. And wow, HTML-encoding E-mail is a nice idea, I'll implement this on my module soon. Thanks.

    Update: implemented and gone to CPAN. check out Version 0.04 of Apache::AntiSpam, if you're interested in. Thanks again for your idea.

      As your module is doing a good job, you might consider adding a filter that converts user@domain.com to user-78c1ed6da0322b3a@domain.com, where the added hex string contains the encrypted timestamp and remote ip-address of the address harvester.

      If one receives spam to such an address, he knows the IP address of the harvester and can try to prosecute the spammer.

      See Anti-Spam Mail Address Encoding (with encrypted IP-Address) for two subs doing the encoding/encrypting

      alex pleiner <alex@zeitform.de>
      zeitform Internet Dienste

Re: Simple Anti-Spam Encoder
by ishmael (Novice) on Jun 26, 2001 at 12:41 UTC
    ishmael

    This is a good idea. The problem is that if many sites adopt this encoding, which cannot be a one-way function, most spammers will adopt a decoding.

    I have tried to address the SPAM issue with my Spider Catcher which feeds bogus emails through a faked self-referring webpage.

    Can we fight spam with spam? How about some method to swamp the spam server with request-remove attempts?