Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Simple Anti-Spam Encoder

by footpad (Abbot)
on Jun 20, 2001 at 00:52 UTC ( [id://89810] : CUFP . print w/replies, xml ) Need Help??

As most of us know, the best SPAM prevention measure is to not publish email addresses on public web pages. While this works, there are times when you want (or a client wants you) to publish one or more email addresses.

This article discusses one way you can do so without increasing your SPAM burden. The following script provides a very simple implementation of that technique.

Update: Added Ovid's suggestion and clarified the Two-Face function. Also...put into practice.

#!/usr/bin/perl -wT # --------------------------------------------------------- # Converts email addresses to a format suitable for mailto: # links, one that is reportedly more difficult for spammers # to harvest. # # Note: mailto: tags have other problems. Feedback scripts # may be more effective defenses; however, this can be used # by your HTML "designers" while you're working on a better # solution (Hint: combine with MIME::Lite). # # Revision log and contact info provided in __DATA__ Share # and long as I get credit, too. # --------------------------------------------------------- use strict; use CGI qw( :standard ); $CGI::HEADERS_ONCE = 1; $CGI::POST_MAX = 1_024; $CGI::DISABLE_UPLOADS = 1; #use CGI::Carp qw( fatalsToBrowser ); #use CGI::Pretty; use HTML::Entities; my $field = "emailaddr"; my $error = cgi_error; if ( $error ) { printForm( "There was a problem with your " . "request.\n\nDetails: $error." ); } elsif ( param( $field ) ) { encodeEmail( param( $field ) ); } else { printForm( "This encodes a simple email address " . '( to a MAILTO: tag ' . "that is more difficult for spammers " . "to harvest.\n\n" . "Please submit the email address you " . "want to encode:\n" ); } exit 1; sub coinFlip # ----------------------------------------------------- # Note that this is weighted heavily in favor of the # codes. # ----------------------------------------------------- { return ( rand() < 0.75 ) ? 1 : 0; } sub encodeEmail # ----------------------------------------------------- # Encodes an email address by randomly converting most # of its characters to ASCII codes. Note weighting in # coinFlip(); according to rumor, this works best when # most, but not all characters are encoded. # ----------------------------------------------------- { my $input = shift; my @chars = split //, $input; my @codes = map { encode_entities( $_, '\x00-\xff' ) } @chars; # See Camel book discussion of srand srand( time() ^ ( $$ + ( $$ << 15 ) ) ); my $result = $input; while ( $result eq $input ) # just in case. { $result = ""; foreach my $index ( 0..$#chars ) { $result .= coinFlip() ? $codes[ $index ] : $chars[ $index ] ; } } printForm( "Your encoded email address " . "is shown below:", $input, $result ); } sub printForm # ----------------------------------------------------- # Prints the HTML. The first parameter is explanatory # text displayed to the user. The second and third # parameters are optional and (respectively) contain # the email address entered by the user and its encoded # version. # ----------------------------------------------------- { my $text = shift; my $email = shift; my $mailto = shift; my $title = "Simple Email Address Encoder"; my $type = "application/x-www-form-urlencoded"; my $source = a( { href=>"" . "drweb/19990329-drweb.html" }, "article" ); print header(), start_html( -title => $title ), h1( $title ), p( $text ), start_form( "post", url(), $type ); if ( $mailto ) { my $link = '<a href="mailto:' . "$mailto" . '">' . "$mailto</a>"; print p( "Your link appears like this: $link" ), p( "The value for the mailto link tag is:", br, textarea( -name => "encoded", -default => $mailto, -rows => 3, -columns => 58, -wrap => "virtual" ) ); } print hr, p( 'Enter an e-mail address to encode:', br, textfield( $field, $email, 60, 50 ) ), p( submit(), reset() ), end_form, p( "For more information, please see this ", $source, "." ), end_html; } __DATA__ Contact Info on my home node: To do list: -- Detect the slight risk of returning all codes and re-run should that happen. I've been told a mixed string is the most effective. -- Fix problems with multiple submissions. There are two, though I think they're from the same bug. -- Podify -- Look for further (reasonable) streamlining. Updates: v0.0.0, 19 Jun 01 - First posted to v0.0.1, 19 Jun 01: -- Added in HTML::Entities, per [Ovid]. (Yay!) -- Revised coinFlip() and calling code to clarify intent (codes returned 75% of the time).

As always, feedback, meme-prevention, and idiom checks would be appreciated.


Replies are listed 'Best First'.
(Ovid) Re: Simple Anti-Spam Device
by Ovid (Cardinal) on Jun 20, 2001 at 02:33 UTC

    Hey, that's a nice trick and I like it. Just one thing I'm wondering about: your &unpackEmail sub does HTML encoding of the characters. Why not have the line

    my @codes = unpackEmail( $input );

    Changed to:

    use HTML::Entities; my @codes = map{ encode_entities( $_, '\x00-\xff' ) } split//, $in +put;

    If I understood your code correctly, you could then drop your unpackEmail sub.


    Update: I forgot to mention, if you want to get rid of all HTML, you can change this:

    my $link = '<a href="mailto:' . "$mailto" . '">' . "$mailto</a>"; # to this: my $link = a( { -href => $mailto }, $mailto );

    I can't wait to email the link to your code to a bunch of my friends (yes, even Ovid has friends). It's a really, really nice trick.

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Simple Anti-Spam Device
by miyagawa (Chaplain) on Jun 20, 2001 at 06:18 UTC
    See my Apache::AntiSpam on CPAN, if you're friendly w/ mod_perl. And wow, HTML-encoding E-mail is a nice idea, I'll implement this on my module soon. Thanks.

    Update: implemented and gone to CPAN. check out Version 0.04 of Apache::AntiSpam, if you're interested in. Thanks again for your idea.

      As your module is doing a good job, you might consider adding a filter that converts to, where the added hex string contains the encrypted timestamp and remote ip-address of the address harvester.

      If one receives spam to such an address, he knows the IP address of the harvester and can try to prosecute the spammer.

      See Anti-Spam Mail Address Encoding (with encrypted IP-Address) for two subs doing the encoding/encrypting

      alex pleiner <>
      zeitform Internet Dienste

Re: Simple Anti-Spam Encoder
by ishmael (Novice) on Jun 26, 2001 at 12:41 UTC

    This is a good idea. The problem is that if many sites adopt this encoding, which cannot be a one-way function, most spammers will adopt a decoding.

    I have tried to address the SPAM issue with my Spider Catcher which feeds bogus emails through a faked self-referring webpage.

    Can we fight spam with spam? How about some method to swamp the spam server with request-remove attempts?