marto9 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,

I tried to make a while loop with multiple threads by using Parallel::ForkManager.
It's a very easy module, which seems to work fine.
I only have one problem. When I run the script my cpu runs to 100% and it keeps on draining more memory.
I hope you perlmonks can fix this.
BTW, I run it with ActivePerl 5.10.

Thx in advance

use LWP::UserAgent; use HTTP::Request; use Parallel::ForkManager; $ua = new LWP::UserAgent; $ua->timeout(2); $ua->agent("Mozilla/6.0"); $pm = new Parallel::ForkManager(10); open(LIST,"lan.txt"); while (<LIST>) { $pm->start and next; chomp($_); $url = "http://".$_."/index.php"; $req = HTTP::Request->new('GET',$url); $res = $ua->request($req); $content= $res->content; if ($content =~ /ok/) { print $_."\n"; } $pm->finish; } $pm->wait_all_children; close(LIST); exit;

Replies are listed 'Best First'.
Re: Parallel::ForkManager (high cpu and a lot of memory)
by dallen16 (Sexton) on Oct 08, 2008 at 01:46 UTC

    ActiveState Perl 5.10 on Win32 now bundles threads, threads::shared, and Thread::Queue -- which will do what you want... Here's sample code using threads and Thread::Queue that does what yours does... in a few more lines though.

    Dewey

    use strict; use warnings; use LWP::UserAgent; use HTTP::Request; use threads; use Thread::Queue; my $MAXTHREADS = 2; #10; sub getURL (); sub endThreads(); my $q = Thread::Queue->new(); my @threadlist = (); foreach my $y (1..$MAXTHREADS) { my $thr = threads->create('getURL'); push(@threadlist,$thr); } open( my $LIST, '<', 'lan.txt' ); while (my $line = <$LIST>) { chomp($line); $q->enqueue($line); } close($LIST); endThreads(); exit(0); sub getURL() { my $ua = new LWP::UserAgent; $ua->timeout(2); $ua->agent("Mozilla/6.0"); my $tid = threads->tid(); print "started thread $tid\n"; while (my $line = $q->dequeue()) { last if lc(substr($line,0,4)) eq 'exit'; my $url = "http://$line"; #/index.php"; my $req = HTTP::Request->new('GET',$url); my $res = $ua->request($req); my $content = $res->content; if ($content =~ /ok/) { print "<$tid>retrieved content from $line\n"; } else { print "<$tid>could not retrieve content from $line\n"; } } } sub endThreads() { foreach my $y (1..$MAXTHREADS) { $q->enqueue('EXIT'); } while (scalar(@threadlist)) { my @newthreadlist = (); foreach my $thr (@threadlist) { if ($thr->is_joinable()) { my $tid = $thr->tid(); $thr->join(); print "joined thread $tid\n"; } else { push(@newthreadlist,$thr); } } @threadlist = @newthreadlist; sleep(1) if scalar(@threadlist); } }
      This code is longer, but it works. Thx!
      Apparantly Parallel::ForkManager doesn't work good with Win32.
      The CPU isn't 100% anymore, but the memory usage is still quite big. When I'm using for example 20 threads the process uses 120mb ram and it stays 120mb.

      My question is: is it normal that it uses that much memory?

        Sounds right. Perl variables are copied into every thread, so it sounds like a reasonable amount for 20 threads.
Re: Parallel::ForkManager (high cpu and a lot of memory)
by Illuminatus (Curate) on Oct 07, 2008 at 20:15 UTC
    When you say the module 'seems to work fine', do you mean the module itself, or this script specifically? How many lines in lan.txt? Given the name 'lan.txt', are all the sites you are hitting accessible via fairly high bandwidth links? If the sites you are hitting are generally responding pretty quickly, and you have lots (more than say, 300) of sites in your file, it is not surprising that the CPU pegs. Could you be more specific about draining memory, ie, how much how fast?
      I'm checking around 200 ip's. And when I look at my task manager in about 15 secs the process already has 400mb and it keeps on getting more. The cpu jumps directly to 100% when I run the script.
        The fact that your cpu usages jumps up is good. You generally want to use your cpu as much as possible, as that means you're not waiting on network traffic.

        On the other hand:

        1. Unless you're on Windows, you do not get 1 process, you'd get a process for each fork(), meaning 10 processes in this case.

        2. Each child process will take a finite amount of memory. ForkMananger should keep the number of processes limited and memory should be reclaimed when the forked() child is exiting. Memory usage shouldn't increase indefinitely.

        My guess: you're running on windows, and perl's fork() emulation is giving you trouble. It's possible that using threads may work a little better in that case.

        The cpu jumps directly to 100% when I run the script.

        Maximizing resource utilisation is the point of running parallel tasks, so that's a good thing.

Re: Parallel::ForkManager (high cpu and a lot of memory)
by BrowserUk (Patriarch) on Oct 08, 2008 at 10:57 UTC

    Which version of threads are you using? Upgrading to 0.71 seems to avoid memory leaks that the combination of 5.10 and some earlier versions (eg. 0.67) exhibited.

    There is no way that you should using 100% cpu with 10 threads performing IO. This seems to be a problem with Parallel::ForkManager on 5.10. You can do pretty much exactly the same thing as above, but using threads, like this:

    #! perl -slw use threads; use threads::shared; use LWP::UserAgent; use HTTP::Request; my $semStdout :shared; my $running :shared = 0; open(LIST,"urls.txt"); while ( my $tld = <LIST> ) { chomp $tld; Win32::Sleep( 100 ) while do{ lock $running; $running >= 10 }; async{ { lock $running; ++$running; } my $url = "http://$tld/"; my $ua = new LWP::UserAgent; $ua->timeout(5); $ua->agent("Mozilla/6.0"); my $req = HTTP::Request->new('GET',$url); my $res = $ua->request($req); my $content = $res->content; my $status = $content =~ /OK/i ? 'ack' : 'nak'; { lock $semStdout; printf "(%3d)$tld: %s\n", threads->self->tid, $status; } { lock $running; --$running; } }->detach; } close(LIST);

    Memory usage seems to be stable and cpu usage < 10% for 10 threads.

    There are better, lower resource intensive ways of using threads, but it does have the virtue of being very close to the P::FM way of operating which you might consider a bonus.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      There is no way that you should using 100% cpu with 10 threads performing IO
      But it's not just IO, LWP::UserAgent loads and parses several Perl modules on demand.

      For instance, running the OP script (with a fixed set of URLs) under strace on my machine shows that every process reads...

      bytes Compress::Raw::Zlib Compress::Zlib Fcntl File::Glob File::GlobMapper File::Spec File::Spec::Unix HTML::Entities HTML::HeadParser HTML::Parser IO IO::Compress::Adapter::Deflate IO::Compress::Base IO::Compress::Base::Common IO::Compress::Gzip IO::Compress::Gzip::Constants IO::Compress::RawDeflate IO::Compress::Zlib::Extra IO::File IO::Handle IO::Seekable IO::Select IO::Socket IO::Socket::INET IO::Socket::UNIX IO::Uncompress::Adapter::Inflate IO::Uncompress::Base IO::Uncompress::Gunzip IO::Uncompress::RawInflate List::Util LWP::Protocol::http Net::HTTP Net::HTTP::Methods Scalar::Util SelectSaver Socket Symbol URI::_generic URI::http URI::_query URI::_server utf8

        Given that forks are threads on win32, and I see those same file access on my machine with my threaded code, there has to be something else going on that is consuming cpu, because the threaded version uses far less.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parallel::ForkManager (high cpu and a lot of memory)
by perrin (Chancellor) on Oct 07, 2008 at 21:44 UTC
    Are you on Windows? If so, you're really getting threads, and they're not sharing memory the way forked processes would. Try running it on unix if you can.
Re: Parallel::ForkManager (high cpu and a lot of memory)
by BrowserUk (Patriarch) on Oct 08, 2008 at 01:11 UTC