There are all kinds of methods for blocking banner ads.
However, I feel guilty about blocking ads from the smaller
sites that depend on the advertising income to stay afloat. I'm
getting the benefit of the site, but the sponsors don't
see any hits on their ads, so the site operator gets short-changed
by my filter.
I've used Perl to rectify this. I have set up Apache on my
local machine, and enabled mod_rewrite with a rule like this
in my httpd.conf:
RewriteEngine on
RewriteRule ^/.*$ /blank.cgi
Now, in /etc/hosts, I have a list of servers who only
serve ad content, similar to this:
127.0.0.1 adfu.blockstackers.com
127.0.0.1 ad.doubleclick.net
127.0.0.1 m.doubleclick.net
Now, when a request for any URL at one of the above servers
is made by my browser, it is diverted to my local http
server, and my custom CGI script. The CGI script makes a
connection to the ad server, figures out whether the content
is supposed to be text or an image, and then generates a
blank page or transparent gif for my browser to use in place of the ad.
It closes the connection before actually downloading the
ad.
The net result: I don't have to waste bandwidth on banner
ads, and the site operator still has a "GET" request for
the ad in their logs, so they still get paid (assuming they
get paid for page views instead of click-throughs)
I am posting the code as a reply to this message.
RE: Benevolent Ad Filter
by httptech (Chaplain) on Jun 08, 2000 at 01:30 UTC
|
#!/usr/bin/perl
use CGI::Carp qw(fatalsToBrowser);
use Net::DNS;
use Socket;
my $url = my $path = $ENV{'REQUEST_URI'};
my $host = $ENV{'HTTP_HOST'};
my $referrer = $ENV{'HTTP_REFERER'};
my $useragent = $ENV{'USER_AGENT'};
my $content_type;
$path =~ s#/([^/])*$##;
GETCONTENT: $content_type = &get_content_type;
&print_gif if $content_type =~ /image/;
&print_text;
exit;
sub print_text { print "Content-type: $content_type\n\n<!-- -->\n"; ex
+it }
sub print_gif {
# Prints a transparent GIF. Adapted from code created by Turnstep:
# http://www.perlmonks.org/index.pl?node_id=7974
print "Content-Length: 43\nContent-type: image/gif\n\n";
printf "GIF89a\1\0\1\0%c\0\0%c%c%c\0\0\0%s,\0\0\0\0\1\0\1\0\0%c%c%
+c\1\0;",
144,0,0,0,1?pack("c8",33,249,4,5,16,0,0,0):"",2,2,4;
exit;
}
sub nslookup {
my $host = shift;
my $dns = new Net::DNS::Resolver;
my $query = $dns->search($host);
if ($query) {
for $rr ($query->answer) {
next unless $rr->type eq "A";
return $rr->address;
}
}
else {
die "lookup of $host failed: ", $dns->errorstring, "\n";
}
}
sub get_content_type {
my ($type, $location);
my $addr = &nslookup($host);
my $proto = getprotobyname('tcp');
my $inet = inet_aton($addr);
socket(S,PF_INET,SOCK_STREAM,$proto) || die "Couldn't open socket:
+ $!";
if (connect(S,pack "SnA4x8",2,80,$inet)) {
select (S);
$| = 1;
print "GET $url HTTP/1.0\r\n";
print "Referer: $referrer\r\n";
print "User-Agent: $useragent\r\n\r\n";
while (<S>) {
if (/^Content-type:\s*(.*)(?:\;|\n)/i) { $type = $1 }
if (/^Location:\s*(.*)$/i) { $location = $1 }
last if /^\s*$/; # Close the connection after getting the
+headers
}
select (STDOUT);
close S;
} else { die "Couldn't connect to $host : $!" }
if ($location) { # We've been redirected
if ($location =~ m#^http://([^/]*)/(.*)$#) {
$host = $1;
$url = "/$2";
}
else {
$url = ($location =~ /^$path/o) ? $location : "$path/$loca
+tion";
}
goto GETCONTENT;
}
return $type;
}
| [reply] [Watch: Dir/Any] [d/l] |
|
This is somewhat related:
I recently installed FilterProxy, a Proxy http server written in Perl of course. What's quite nice about it is that one can point one's web browser at the proxy port and configure all the regexes used, etc. *excellent* program if you don't want to install apache to use this one.
The only real advantage to punk music is that nobody can whistle it.
| [reply] [Watch: Dir/Any] |
|
Actually you really don't need to install Apache to use this.
You could use Miniweb,
the standalone single-CGI script http server that runs from
inetd.
| [reply] [Watch: Dir/Any] |
RE: Benevolent Ad Filter
by httptech (Chaplain) on Jun 09, 2000 at 02:31 UTC
|
By the way, here's the complete list of blocked sites from my
/etc/hosts:
127.0.0.1 adfu.blockstackers.com
127.0.0.1 ad.doubleclick.net
127.0.0.1 m.doubleclick.net
127.0.0.1 ad.webprovider.com
127.0.0.1 image.linkexchange.com
127.0.0.1 jeeves.flycast.com
127.0.0.1 www.flycast.com
127.0.0.1 www.burstmedia.com
127.0.0.1 www.247media.com
127.0.0.1 www.ad-venture.com
127.0.0.1 www.adauction.com
127.0.0.1 www.adsdaq.com
127.0.0.1 a32.g.a.yimg.com
127.0.0.1 www.pagecount.com
127.0.0.1 www1.pagecount.com
127.0.0.1 www2.pagecount.com
127.0.0.1 www3.pagecount.com
127.0.0.1 www4.pagecount.com
127.0.0.1 ad.linkexchange.com.com
127.0.0.1 www.smartclicks.com
127.0.0.1 mojofarm.mediaplex.com
127.0.0.1 www.etour.com
127.0.0.1 netadsrv.iworld.com
If you come across any other "filter-worthy" servers, let me know!
| [reply] [Watch: Dir/Any] [d/l] |
|
At www.waldherr.org, there is a blocklist for the Internet Junkbuster, another blocking proxy. I guess that if you just take the top part, where servers are blocked completely, you will get a good list of servers-to-be-blocked :). Of course, the IJB does a bit more, it does regex-based blocking of URLs, but we can always add this here as well ;).
| [reply] [Watch: Dir/Any] |
RE: Benevolent Ad Filter
by merlyn (Sage) on Jun 08, 2000 at 05:03 UTC
|
| [reply] [Watch: Dir/Any] |
|
Interesting. But it looks like there are some important differences:
The mod_perl version proxies everything, not just ad servers.
However it only blocks images; sometimes ads come in the form
of javascript or even java. But they usually get sent from
the same server for tracking purposes, so my script will block
all ad content from a given server. (You could probably
alter the mod_perl version to do this though)
The mod_perl version actually retrieves the entire file it blocks,
which I think is a waste of bandwidth, but you're forced into
that if you use LWP (as far as I know). That's why I use the
Socket module, and close the connection as soon as I have the
headers. The trade-off for this is my version will not work
through another proxy server.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
and close the connection as soon as I have the headers
#!/usr/bin/perl
use LWP::Simple;
if (head('http://www.foo.com/')) {
print "Page exists and would download fine!\n";
}
In list context, head returns all kinds of interesting values such as
response code, last modified, content-length etc. | [reply] [Watch: Dir/Any] [d/l] |
|
The reason for not using HEAD is simple; it shows up in the
logs as a HEAD request and not a GET. If I were a advertiser
I would be not count HEAD requests as legitimate page views,
since its clear that the ad was never actually viewed.
| [reply] [Watch: Dir/Any] |
|
RE: Benevolent Ad Filter
by swiftone (Curate) on Jun 08, 2000 at 21:57 UTC
|
My network admin noticed that my machine was sucking up huge amounts of bandwidth (due to leaving one or two netscape windows on PerlMonks all night/over the weekend). Thus I have installed Squid and am using it as a proxy to block all ad banners, but now I feel guilty over costing PerlMonks and other sites their ad revenue. This script should work fine as a Squid Redirect script (No apache modifications necessary).
On another note, someone who didn't want to screw up their Apache server could set up a Virtual host to do this. | [reply] [Watch: Dir/Any] |
|
|