in reply to another core dump

The most likely reason is that you are never joining your threads. Just loading the front page at CNN.com started 31 threads. These threads then terminate, but are never cleaned up. Each thread is somewhat over 1 MB in size, and the result is that having loaded cnn.com, the memory use has grown to close to 50 MB in total. A couple of refreshes and this will force swapping and resource exhausion.

Your also passing $req and $host, lexical scalars, to each thread. Whilst the scalar is being shared automatically for you, and you are using it read-only, so the lack of locking is probably ok. Each time you share a variable, it is shared with every thread. That means that every thread created (including those that are dormant but unjoined is getting a copy of every request object added to it memory space.

As a first pass at fixing this, you should detach your threads once you've spawned them so that the die a natural death and undef $req & $host before the thread terminates.

... } else { threads->create( \&process_one_req, $browser, $req, $host )->detach; } ... sub process_one_req { my ($browser, $req, $host) = @_; my $remote = new IO::Socket::INET( Proto => "tcp", PeerAddr => $host, PeerPort => 80 ); if ($remote) { print $remote $req; my $chunk; print $browser $chunk while (sysread($remote, $chunk, 10000)); close($remote); undef($remote); } else { print $browser RES_400; } close($browser); undef($req); undef($host); undef($browser); }

Making these changes, I can load and reload the cnn frontpage and whilst the memory use grows to around 8 MB at the peak, it rapidly falls back to aroud 5 MB as the requests complete and the threads die. This seems to cure the continuous memory growth completely and may effect a cure for your transient core dumps.

I also noticed that if the request is a POST rather than a GET, then your regex to extract the page name fails and results in

Use of uninitialized value in concatenation (.) or string at P:\test\p +roxy.pl8 line 42. Received request for [perlmonks.com, ]

HTH.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

Replies are listed 'Best First'.
Re: Re: another core dump
by pg (Canon) on Oct 25, 2003 at 05:33 UTC

    Good point, I am now adding the detach and undef.

    I noticed from time to time, that when Perl fail to allocate memory, it dies pretty ugly ;-), and it should be fixed.

    In case those changes you suggested, improves the situation, but cannot resolve the problem, what I will do then is to get rid of IO::Socket::INET, and go with low level socket. Remember that bless is not thread-safe at this point. I will come back with the result and share with you and everyone.

    No, it does not support POST, that's a known caveat, but I don't have time to worry about it now ;-)

      FWIW, I changed the regex to

      my $page = ($req =~ m/^(?:GET|POST|HEAD)\s*(.*?)\s/)[0];

      And (assuming this works:), I am posting this reply via your proxy.

      As far as the IO::Socket::INET code is concerned, as far as I can tell, it should be ok. The module is being replicated into each thread, so each will have its own copy of the code and as the underlying resource is a GLOB, essentially a filehandle, it is a process global resource that should work okay so long as you don't try and use it simultaneously.

      I *think* that by locking $browser inside the if statement in process_one_req(), before you print to it, you should be okay to stick with it.... I'll try to verify this.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!