pg has asked for the wisdom of the Perl Monks concerning the following question:

I wrote this for myself, to stop those garbages on internet and speed up, by not forwarding certain requests.

It works and satisfied my needs, however core dumps once a while. Can anyone spot anything, or just pure guess? Your help is greatly apprecaited.

use IO::Socket::INET; use threads; use strict; use warnings; my $banned_type = { "cab" => 1, "class" => 1, "dat" => 1, "exe" => 1, "gif" => 1, "js" => 1, "swf" => 1 }; my $banned_site = { "ad.doubleclick.net" => 1, "search-itnow.com" => 1, "www.ftstock.com" => 1, "www.newshub.com" => 1 }; use constant RES_400 => "HTTP/1.1 400 Bad Request\r\n\r\n"; use constant LISTEN_SIZE => 20; use constant LISTEN_PORT => 8080; use constant MAX_REQ_SIZE => 5000; my $browser_listener = new IO::Socket::INET(Proto => "tcp", LocalAddr => "localhost", LocalPort => LISTEN_PORT, Resue => 1, Listen => LISTEN_SIZE) || die "Failed to initialize browser listener socket +\n"; print "Internet filter started\n"; while (1) { my $browser = $browser_listener->accept(); if ($browser) { my $req; sysread($browser, $req, MAX_REQ_SIZE); my $host = ($req =~ m/Host:\s*(.*?)\r/)[0]; my ($method, $page) = ($req =~ m/^(.*?)\s+(.*?)\s/); print "Received request for [$host, $page]\n"; if ($method eq "GET") { if ($host && $page) { if (is_banned_site($host) || is_banned_type($page)) { print "[$host, $page] is banned\n"; print $browser RES_400; close($browser); } else { threads->create(\&process_one_req, $browser, $req, + $host); } } else { open(BUG, ">bug.txt"); print BUG $req; close(BUG); close($browser); } } else { threads->create(\&process_one_req, $browser, $req, $host); } } else { print "Bad accept, skip\n"; } } sub process_one_req { my ($browser, $req, $host) = @_; my $remote = new IO::Socket::INET(Proto => "tcp", PeerAddr => $hos +t, PeerPort => 80); if ($remote) { print $remote $req; my $chunk; print $browser $chunk while (sysread($remote, $chunk, 10000)); close($remote); undef($remote); } else { print $browser RES_400; } close($browser); undef($browser); } sub is_banned_site { my $site = shift; return 1 if (exists($banned_site->{$site})); if ($site =~ m/offeroptimizer/) { return 1; } if ($site =~ m/revenue.net$/) { return 1; } if ($site =~ m/popupsponsor.com$/) { return 1; } return 0; } sub is_banned_type { return (exists($banned_type->{(split(/\./, shift))[-1]})) ? 1 : 0; }

Replies are listed 'Best First'.
Re: another core dump
by BrowserUk (Patriarch) on Oct 25, 2003 at 05:09 UTC

    The most likely reason is that you are never joining your threads. Just loading the front page at CNN.com started 31 threads. These threads then terminate, but are never cleaned up. Each thread is somewhat over 1 MB in size, and the result is that having loaded cnn.com, the memory use has grown to close to 50 MB in total. A couple of refreshes and this will force swapping and resource exhausion.

    Your also passing $req and $host, lexical scalars, to each thread. Whilst the scalar is being shared automatically for you, and you are using it read-only, so the lack of locking is probably ok. Each time you share a variable, it is shared with every thread. That means that every thread created (including those that are dormant but unjoined is getting a copy of every request object added to it memory space.

    As a first pass at fixing this, you should detach your threads once you've spawned them so that the die a natural death and undef $req & $host before the thread terminates.

    ... } else { threads->create( \&process_one_req, $browser, $req, $host )->detach; } ... sub process_one_req { my ($browser, $req, $host) = @_; my $remote = new IO::Socket::INET( Proto => "tcp", PeerAddr => $host, PeerPort => 80 ); if ($remote) { print $remote $req; my $chunk; print $browser $chunk while (sysread($remote, $chunk, 10000)); close($remote); undef($remote); } else { print $browser RES_400; } close($browser); undef($req); undef($host); undef($browser); }

    Making these changes, I can load and reload the cnn frontpage and whilst the memory use grows to around 8 MB at the peak, it rapidly falls back to aroud 5 MB as the requests complete and the threads die. This seems to cure the continuous memory growth completely and may effect a cure for your transient core dumps.

    I also noticed that if the request is a POST rather than a GET, then your regex to extract the page name fails and results in

    Use of uninitialized value in concatenation (.) or string at P:\test\p +roxy.pl8 line 42. Received request for [perlmonks.com, ]

    HTH.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      Good point, I am now adding the detach and undef.

      I noticed from time to time, that when Perl fail to allocate memory, it dies pretty ugly ;-), and it should be fixed.

      In case those changes you suggested, improves the situation, but cannot resolve the problem, what I will do then is to get rid of IO::Socket::INET, and go with low level socket. Remember that bless is not thread-safe at this point. I will come back with the result and share with you and everyone.

      No, it does not support POST, that's a known caveat, but I don't have time to worry about it now ;-)

        FWIW, I changed the regex to

        my $page = ($req =~ m/^(?:GET|POST|HEAD)\s*(.*?)\s/)[0];

        And (assuming this works:), I am posting this reply via your proxy.

        As far as the IO::Socket::INET code is concerned, as far as I can tell, it should be ok. The module is being replicated into each thread, so each will have its own copy of the code and as the underlying resource is a GLOB, essentially a filehandle, it is a process global resource that should work okay so long as you don't try and use it simultaneously.

        I *think* that by locking $browser inside the if statement in process_one_req(), before you print to it, you should be okay to stick with it.... I'll try to verify this.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!

Re: another core dump
by pg (Canon) on Oct 25, 2003 at 07:26 UTC

    With either BrowserUK's fixes alone, or my fixes alone does not work, but put them together it works now. It has been working for half hour already, with I using intensively all the time, and there is no core.

    I am now using low level socket instead of IO::Socket::INET, without this fix, it still cores, but I now tend to believe that's also because of memory usage, as IO::Socket::INET creates more overhead.

    If you monitor what some of those sites are downloading without your notice, you will be surprised...Wow!

    Thanks BrowserUK and his long posts as he does always ;-P

    use threads; use strict; use warnings; my $banned_type = { "cab" => 1, "class" => 1, "dat" => 1, "exe" => 1, "gif" => 1, "js" => 1, "swf" => 1 }; my $banned_site = { "ad.doubleclick.net" => 1, "search-itnow.com" => 1, "www.ftstock.com" => 1, "www.newshub.com" => 1 }; use constant RES_400 => "HTTP/1.1 400 Bad Request\r\n\r\n"; my $proto = getprotobyname('tcp'); socket(BROWSER_LISTENER, PF_INET, SOCK_STREAM, $proto) || die "Failed +to create socket: $!"; setsockopt(BROWSER_LISTENER, SOL_SOCKET, SO_REUSEADDR, pack("l", 1)) | +| die "Failed to setsockopt: $!"; bind(BROWSER_LISTENER, sockaddr_in(8080, INADDR_ANY)) || die "Failed t +o bind: $!"; listen(BROWSER_LISTENER, SOMAXCONN) || die "Failed to listen: $!"; print "Internet filter started\n"; while (1) { my $browser; accept($browser, BROWSER_LISTENER); my $req; sysread($browser, $req, 10000); my $host = ($req =~ m/Host:\s*(.*?)\r/)[0]; my $page = ($req =~ m/^(?:GET|POST|HEAD)\s*(.*?)\s/)[0]; print "Received request for [$host, $page]\n"; if ($host && $page) { if (is_banned_site($host) || is_banned_type($page)) { print "[$host, $page] is banned\n"; print $browser RES_400; close($browser); } else { my $tid = threads->create(\&process_one_req, $browser, $re +q, $host)->detach(); } } else { close($browser); } } sub process_one_req { my ($browser, $req, $host) = @_; my $iaddr = inet_aton($host) || die "no host: $host"; my $paddr = sockaddr_in(80, $iaddr); $proto = getprotobyname('tcp'); my $remote; socket($remote, PF_INET, SOCK_STREAM, $proto) || die "socket: $!"; connect($remote, $paddr) || die "connect: $!"; print $remote $req; my $chunk; while (sysread($remote, $chunk, 10000)) { print $browser $chunk; debug($chunk) if ($req =~ m/^POST/); } close($remote); undef($remote); close($browser); undef($browser); undef($req); undef($host); } sub is_banned_site { my $site = shift; return 1 if (exists($banned_site->{$site})); if ($site =~ m/offeroptimizer/) { return 1; } if ($site =~ m/revenue.net$/) { return 1; } if ($site =~ m/popupsponsor.com$/) { return 1; } if ($site =~ m/hitbox.com$/) { return 1; } return 0; } sub is_banned_type { return (exists($banned_type->{(split(/\./, shift))[-1]})) ? 1 : 0; }