Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

RE: RE: Benevolent Ad Filter

by httptech (Chaplain)
on Jun 08, 2000 at 06:18 UTC ( [id://17013]=note: print w/replies, xml ) Need Help??


in reply to RE: Benevolent Ad Filter
in thread Benevolent Ad Filter

Interesting. But it looks like there are some important differences:

The mod_perl version proxies everything, not just ad servers. However it only blocks images; sometimes ads come in the form of javascript or even java. But they usually get sent from the same server for tracking purposes, so my script will block all ad content from a given server. (You could probably alter the mod_perl version to do this though)

The mod_perl version actually retrieves the entire file it blocks, which I think is a waste of bandwidth, but you're forced into that if you use LWP (as far as I know). That's why I use the Socket module, and close the connection as soon as I have the headers. The trade-off for this is my version will not work through another proxy server.

Replies are listed 'Best First'.
RE: RE: RE: Benevolent Ad Filter
by merlyn (Sage) on Jun 09, 2000 at 02:40 UTC
    You can, in fact, use LWP to load just the first part of the GET request, by using a content-callback handler that throws an exception, cutting off any further action. Quoting from perldoc LWP::UserAgent:
    The request can be aborted by calling die() in the call- back routine. The die message will be available as the "X-Died" special response header field.

    -- Randal L. Schwartz, Perl hacker

      That would be great. I really would like to use LWP if possible. I liked the fact that it would handle redirects for me. However I don't know if the built-in redirect function would work if I call die() during the callback.
RE: RE: RE: Benevolent Ad Filter
by Anonymous Monk on Jun 08, 2000 at 16:31 UTC
    and close the connection as soon as I have the headers

    #!/usr/bin/perl use LWP::Simple; if (head('http://www.foo.com/')) { print "Page exists and would download fine!\n"; }
    In list context, head returns all kinds of interesting values such as response code, last modified, content-length etc.
RE: RE: RE: Benevolent Ad Filter
by httptech (Chaplain) on Jun 08, 2000 at 18:38 UTC
    The reason for not using HEAD is simple; it shows up in the logs as a HEAD request and not a GET. If I were a advertiser I would be not count HEAD requests as legitimate page views, since its clear that the ad was never actually viewed.
      If I were a advertiser I would be not count HEAD requests as legitimate page views, since its clear that the ad was never actually viewed.

      Good point, I stand corrected!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://17013]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-25 17:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found