cascade_fx has asked for the wisdom of the Perl Monks concerning the following question:

Hello and thanks in advance for any help or resources that you may offer.

My wife is getting tons of annoying porn spam that use img includes to display some pretty "clinical" pictures for her. It is hard to tell by the subject lines alone that she is going to be greeted by a gynocological exam photo when she opens any particular piece of mail. The subject lines say things like, "I just want to double check the time for our meeting" or something similar. Given that she is an college advisor and works with thousands of students, the subject lines are plausible enough to require her to open and the email addresses are to numerous to remember if each email is from a real client or not.

She tried the Outlook spam tools, but the only block based on the exact same email address. I had her try SpamNet from Cloudmark (which I have had tons of success with), but caused resource and lockup issues with her PC.

She uses Windows 98 (per mandate... though I would want the final version to be easily updated to work with 2000/XP/Future release as well) and Outlook. While Outlook will allow you to set the default format of sent mail, it won't (in version 2000) let you set the default read format... so any HTML is rendered (and any images are included inline) when opened.

I have had some success in having her redirect server names garnered from the HTML source of the message using her hosts file. She just opens the source of the message (the first time she receives one that isn't blocked) and puts the server name that the image is being pulled from (example: www.pr0n.com) and redirects to the loopback IP address (127.0.0.1).

It works great (besides proving that I am blessed with a wife who is willing to learn how to maintain a hosts file) if the img src is referenced by name. Well, the pr0n spammers are onto this technique and now reference all the images by IP only. So, there is no way to build a defense through the hosts file.

So. I looked all over and found a number of personal firewalls that allow you to block by IP address... but they are fixing different problems and seem to be attacking the problem with a sledgehammer. I haven't found any personal proxies that allow you to create a blacklist by name and IP that will work for all Internet traffic. Most proxies that I have found just allow you to point the browser to a proxy and work from there. Unfortunately, Outlook ignores them.

So, I figured that a little perl proxy shouldn't be impossible to whip up. I would want it to pass all traffic through except that which has been explicitly blacklisted by name (www.pr0n.com) or by IP (69.69.69.69). For those, it should just send back an appropriate reset, timeout, or error. Everything else should remain untouched.

So, I don't want a solution (unless one exists already), but pointers to snippets, resources, docs, and howtos would be wonderful.

Thanks for anything that you have to give.

Replies are listed 'Best First'.
Re: Perl Internet Proxy for Windows
by Jaap (Curate) on Jan 29, 2003 at 15:20 UTC
    non-perl related answer:
    I know of at least one personal windows firewall (zonealarm) that allows you to let outlook use only ports 110 and 25 and not 80. This would make it impossible for the pics to be retrieved. /non-perl

    Perl related:
    If the proxying doesn't work for you, you could also cron a script that deletes the spam mails from the pop box every 5 minutes.

      non-perl related response:
      I have thought about using zone alarm, but have had issues with it and our authentication system at the university. For some reason, zone alarm doesn't get along with our domain authentication very well, even if it is given full and unfettered access. What happens is that email only shows up if you expliciity refresh the panes of the application. Otherwise, nothing.

      somewhat perl related response:
      I have done some looking and maybe a netcat perl hybrid will do the trick. Still looking.

      Perl related response:
      That is definitely one route that I hadn't considered. Though Windows 98 doesn't have cron (or does it have something similar), I think I might be able to do that. I would still prefer something local.

      Thanks

Re: Perl Internet Proxy for Windows
by Mr. Muskrat (Canon) on Jan 29, 2003 at 16:02 UTC
      I hadn't even thought of a RBL. I guess I could create a script to IMAP in and check addresses and then move stuff around to SPAM folders if they meet an RBL criteria.

      Downsides:

      1. RBLs could get legitimate mail as well and she'd have to check them again once they'd been sorted. The administration would be somewhat out of her hands then. Though, I guess I could build in some sort of override mechanism if I went this route.
      2. We use Exchange as the primary delivery method. Though we have POP and IMAP access, most people leave email up all day (you have to with Outlook's notify mechnism). I'd have to hit the server an awful lot to beat the "always on" nature of our Outlook setup. I don't think that would go over too well.
      The Proxy idea appealed (in a completely theoretical idea) because it would be "always on". You could also build more tools into it in the future (worm blocking and stuff like that... if I got ambitious).

      Something to think about.

Re: Perl Internet Proxy for Windows
by meetraz (Hermit) on Jan 29, 2003 at 18:09 UTC
    I use popfile... Currently 97% accurate for me, and it's written in perl!

      I think this brings up a good point... you seem to be interested in filtering outgoing web requests, so that your spouse doesn't have to look at the nasty pics outlook insists on showing her. But wouldn't it be better if she never had to look at the email at all?

      The solution I use to get rid of spam is SpamAssassin which while it has it's faults, but it has worked well for me after some tweaking. The popfile program that meetraz mentioned looks like it would be closer to what you really need in this situation. If it had SpamAssassin in the middle of it instead of it's training method (or had both) I'd call it inspired. (IMHO, it would be better if it acted as an IMAP server, so that you can have your SPAM in a different box right off the bat, but whatever.)

      In any case, a simple filtering proxy server might have other handy applications, but the maintenance overhead seems like it might make it a pain. (Adding the offending IP addresses, which quite likely change often.) With SpamAssassin, I haven't had to tweak anything in months. However, with solutions like popfile and SpamAssassin you need to check what it's marking as SPAM periodically to insure that hasn't canned something you really needed to see.

      Good luck...

Re: Perl Internet Proxy for Windows
by jonadab (Parson) on Jan 30, 2003 at 03:43 UTC

    As others have pointed out, a Perl solution is complicated, because you have to stop Outlook from doing something. Perl can do plenty of stuff, but stopping another app (that you want to leave open) from doing something is hard. You could put the PC behind an IP-Masquerading firewall and prevent outgoing access to port 80, then run your browser through a proxy, which you could write in Perl (or just use Squid, whatever), but for that you need to run a second system all the time (though it could be an old cheap one).

    The other possibility is to run the mail itself through a filter. This means changing Outlook's idea of where to get mail so that it looks on localhost and finds your proxy, which must retrieve the mail from the real server and wash it, removing objectionable content, before providing it to the real client; as a start, you could do something like s/<img.*?\/img>//gi; on all HTML parts. (That probably means your proxy has to grok MIME. I think there's a module for that, though. I think there's also a POP3 module that you could use for retrieving the mail from the real server. Not sure about the other side of POP3 where you have to respond to Outlook.) While you're at it you might want to s/<script.*?\/script>//gi; also, but that still leaves onfoo="somescript()" stuff, and you want to consider images loaded by stylesheets, plus background image attributes, and I wouldn't put money on that being enough either. You're going to end up doing a lot of processing and never knowing quite when you're going to be surprised with a new sort of thing that slips through.

    So, how to stop Outlook from doing stuff it oughtn't? That's the rub, because Outlook is exacerbating your problem. (Some might even say Outlook is your problem, but that might be hyperbole; spam is annoying regardless of what software you use.) Not loading remote images in mail messages is such a basic privacy feature that Outlook's getting this wrong should be a red flag, even if you haven't been paying attention the last couple of years to the various other security and privacy flaws that have been uncovered in Outlook at a rate of about one a month. If there is any way you can get her to use a different mail reader... you should. There are a number of choices; "better than Outlook" is not a tough criterion to meet.

    If you want a specific suggestion, I usually recommend Pegasus Mail to people who don't mind if it only runs on Windows. It has a very low learning curve, advanced filtering with both simple substring matching and also with regular expression capabilities (though not with the power of Perl regexps) with flow control and a lot of possible actions, and all the basic features, and as an added bonus trying to launch any executable attachment results in a scary warning with the word "virus" in the dialog title and "Cancel" as the default button -- and it doesn't even think about retrieving images from the web. But if you don't like Pegasus, there are lots of other options.

    If you do go forward with the proxy thing, it's probably best to proxy the mail (rather than the web). You can do dual action, then: besides dropping entire messages if you are certain they're spam, you also wash possibly-partly-okay messages to remove dangerous items (scripts, image tags, object tags, applet tags, and so on -- I would be tempted to remove HTML parts altogether. Of course, you can also have a whitelist of From: fields that cause the whole message to be passed through unaltered).

     --jonadab

Re: Perl Internet Proxy for Windows
by pg (Canon) on Jan 29, 2003 at 18:32 UTC
    I posted two different pieces of proxy code before, for different purposes, you can take a look at them: piece1 and piece2.

    However none of them filters requests, instead just pass everything thru.

    What you can do is, as you said, to keep a black list, and when a request come into your proxy from your browser, have your proxy look at both the Host field and the url following GET, to determine whether should let it go through or not, if not just let your proxy reply with HTTP 404 (not found) or even better 403 (forbidden).

    In this way, your browser would still work properly. 403 is better as it fits the purpose perfectly, and also HTTP 1.1 says that, the client should not repeat the request after receives 403.
Re: Perl Internet Proxy for Windows
by logan (Curate) on Jan 29, 2003 at 18:42 UTC
    Another non-perl solution: www.mailfrontier.com

    They have a downloadable app which automatically challenges any mail sent to you. If the sender doesn't respond to the challenge, the mail is deleted, and you aren't bothered. If they do respond (correctly), their address is placed on a "white list", and all future mail gets right through. It works really well.

    In all fairness, I must say that one of Mailfrontier's founders is a freind of mine, but no kidding, their software works great.

    -Logan
    "What do I want? I'm an American. I want more."

Re: Perl Internet Proxy for Windows
by Mr. Muskrat (Canon) on Jan 29, 2003 at 21:14 UTC

    SysApe9000 mentioned SpamAssassin but you are on a Win32 system so...
    I'd wholeheartedly support mcd's Pop3proxy!

    It uses SpamAssassin but is designed for use on Win32 systems and it's written in Perl.

Re: Perl Internet Proxy for Windows
by Azhrarn (Friar) on Jan 30, 2003 at 07:17 UTC
    Non-perl answer to filtering http connections under windows in general:

    The Proxomitron

    Just tell Outlook, and/or your web browser to use Proxomitron for a local proxy, and you can blacklist/filter to your hearts content. Only reason I mentioned it is I saw the other replies mentioning filtering the spam, this may help catch nasty html that makes it through. Besides being able to blacklist servers with a regexp-like syntax.

    Of course, you mentioned that Outlook seems to ignore proxies, which is a bit unusual. It may actually be using the Internet Explorer proxy settings (Windows likes to do that). But if you can't get it to point at the proxy in the first place, then you're kinda screwed unless you run an external router/gateway/whatever.

    If you do figure a way around that problem, and do want to do a perl solution, you may want to look into that wonderfull little package known as POE. Or more specifically this sample on TCP Redirection for an idea on how to get started. Unfortunately, you still run into how exactly you are going to get Outlook to use whatever daemon you do write.


    ----------------
    BP - DG - WB
Re: Perl Internet Proxy for Windows
by line_noise (Sexton) on Jan 29, 2003 at 23:09 UTC
    If you're planning on filtering by ip address you may want to use Net::Netmask to block the whole ip range of the nasties. No reason to leave them any wiggle room.

    These comments in no way reflect light outside the visible spectrum
Re: Perl Internet Proxy for Windows
by Notromda (Pilgrim) on Jan 29, 2003 at 23:41 UTC
    non-perl related answer: http://www.deersoft.com/ sells a spamassassin type product. Has anyone tried it? At any rate, try looking around on http://spamassassin.org/ as there should be several ways to accomplish what you need.