in reply to Memory Leak Caused by Forking?

There are (at least) three problems with your program:

  1. You are using fork and pipe for no good reason.

    This: my $content = $response->content(); fetches the entire web page as a single lump and returns it to you in $content.

    Which you then write to your pipe as a single lump: print WRITER $content;.

    In your fork, you then read that back, line by line from the pipe and write it out to your file. That's nonsensical.

    Why not skip the fork and write directly to the file? This works:

    use strict; use warnings; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTTP::Message; my $ua = LWP::UserAgent->new(); my $iteration = 1; my $some_directory = 'C:/test/junk'; #open(URLS, $some_directory.'urls.txt'); while (<DATA>) { my $url = $_; chomp $url; &get_url($url, $iteration); print "($iteration) $url\n"; $iteration++; } close URLS; sub get_url { my ($url, $iteration) = @_; open(FH, '>'.$some_directory.$iteration); my $request = HTTP::Request->new("GET", $url); my $response = $ua->request($request); my $content = $response->content(); print FH $content; close FH; return; } __DATA__ ...

    And it allowed me to remove that dumbly arbitrary 15 seconds delay that wastes 13 seconds when the pages only take 2 seconds to download.

  2. You're working far too hard for what you are doing.

    (Or doing a disservice to those of whom you are asking questions, by concealing the real requirements of your code. Simplifying the problem too much gets answers to the wrong questions).

    This also works (without leaks), and is far simpler:

    use strict; use warnings; use LWP::Simple; my $iteration = 1; my $some_directory = 'C:/test/junk'; #open(URLS, $some_directory.'urls.txt'); while (<DATA>) { my $url = $_; chomp $url; getstore( $url, $some_directory . $iteration ); print "($iteration) $url\n"; $iteration++; } __DATA__ ...
  3. You are using fork on windows.

    This is rarely used and barely tested. It is quite likely that it leaks over time; that those leaks are a internal and would require the bug report and the release of a new version to fix.

    If there is a need to do the downloads asynchronously (and on the basis of what you've presented here, you don't), then threads are a far simpler and better tested option.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: Memory Leak Caused by Forking?
by morgon (Priest) on Oct 21, 2010 at 03:49 UTC
    One more point:

    When you are forking a process that then exits the parent process should to a wait or waitpid to remove the entry fom the process table.

    I am not really sure about Windows but I assume the layer that emulates fork would also emulate this, so even without a memory leak your code would waste ressources.

      I am not really sure about Windows but I assume the layer that emulates fork would also emulate this

      Indeed it does. This version takes care of that, but it doesn't stop the memory leak. It might slow it a little, it is hard to tell, but there are also other problems with the fork emulation that manifest themselves.

      For example, for some reason, using the DATA handle starts returning blank lines interspersed with the actual lines when forking is in use. Which doesn't happen when the fork is skipped. Just another indication that no one is using (or even properly testing) the fork emulation on windows,

      He should also be checking the file opens which probably aren't doing what he thinks they are. He probably shouldn't be using bare-words file handles for this. etc. etc.

      But until he explains why he feels the need to use fork here at all? There just doesn't seem any logic at all in starting a new "process" to fetch a page; pipe that content back to the parent just for it to write it to disk.

      #! perl -slw use strict; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTTP::Message; my $iteration = 1; my $some_directory = 'C:/test/junk'; #open(URLS, $some_directory.'urls.txt'); while (<DATA>) { my $url = $_; chomp $url; get_url($url, $iteration); print "($iteration) $url\n"; #sleep 3; $iteration++; } #close URLS; sub get_url { my ($url, $iteration) = @_; pipe(my $READER, my $WRITER); if (my $pid = fork) { close $WRITER; open(my $FH, '>'. "$some_directory/$iteration" ) or die $!; while (<$READER>) { print $FH $_; } close $READER; close $FH; waitpid $pid, 0; print "$pid returned: ", $? >> 8; } elsif (defined $pid) { close $READER; my $ua = LWP::UserAgent->new(); my $request = HTTP::Request->new("GET", $url); my $response = $ua->request($request); my $content = $response->content(); print $WRITER $content; close $WRITER; undef $request; undef $response; exit 123; } }

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Memory Leak Caused by Forking?
by nwboy74 (Novice) on Oct 21, 2010 at 15:25 UTC

    1. I'm just trying to show the parts that I've narrowed down as causing the problem. I'm dealing with legacy code in a complicated process and for all I know, the originators had good reason to do it the way they're doing it.

    I added the delay just so I could watch what happened to the memory between calls.

    2. Yes. I know. I have to "dumb" down the requirements of the problem. Regardless of how thorough I think I'm being in describing the issue, it's never going to be enough.

    3. I've tried using threads, but apparently I installed perl wrong on my system and it doesn't support threads currently.

      3. I've tried using threads, but apparently I installed perl wrong on my system and it doesn't support threads currently.

      Really, how fascinating. Because on Windows, fork is emulated using threads. So, threads must be working.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.