Re: Parallel::ForkManager (high cpu and a lot of memory)
by dallen16 (Sexton) on Oct 08, 2008 at 01:46 UTC
|
ActiveState Perl 5.10 on Win32 now bundles threads, threads::shared, and Thread::Queue -- which will do what you want... Here's sample code using threads and Thread::Queue that does what yours does... in a few more lines though.
Dewey
use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Request;
use threads;
use Thread::Queue;
my $MAXTHREADS = 2; #10;
sub getURL ();
sub endThreads();
my $q = Thread::Queue->new();
my @threadlist = ();
foreach my $y (1..$MAXTHREADS) {
my $thr = threads->create('getURL');
push(@threadlist,$thr);
}
open( my $LIST, '<', 'lan.txt' );
while (my $line = <$LIST>) {
chomp($line);
$q->enqueue($line);
}
close($LIST);
endThreads();
exit(0);
sub getURL() {
my $ua = new LWP::UserAgent;
$ua->timeout(2);
$ua->agent("Mozilla/6.0");
my $tid = threads->tid();
print "started thread $tid\n";
while (my $line = $q->dequeue()) {
last if lc(substr($line,0,4)) eq 'exit';
my $url = "http://$line"; #/index.php";
my $req = HTTP::Request->new('GET',$url);
my $res = $ua->request($req);
my $content = $res->content;
if ($content =~ /ok/) {
print "<$tid>retrieved content from $line\n";
}
else {
print "<$tid>could not retrieve content from $line\n";
}
}
}
sub endThreads() {
foreach my $y (1..$MAXTHREADS) {
$q->enqueue('EXIT');
}
while (scalar(@threadlist)) {
my @newthreadlist = ();
foreach my $thr (@threadlist) {
if ($thr->is_joinable()) {
my $tid = $thr->tid();
$thr->join();
print "joined thread $tid\n";
}
else {
push(@newthreadlist,$thr);
}
}
@threadlist = @newthreadlist;
sleep(1) if scalar(@threadlist);
}
}
| [reply] [d/l] |
|
|
| [reply] |
|
|
Sounds right. Perl variables are copied into every thread, so it sounds like a reasonable amount for 20 threads.
| [reply] |
Re: Parallel::ForkManager (high cpu and a lot of memory)
by Illuminatus (Curate) on Oct 07, 2008 at 20:15 UTC
|
When you say the module 'seems to work fine', do you mean the module itself, or this script specifically? How many lines in lan.txt? Given the name 'lan.txt', are all the sites you are hitting accessible via fairly high bandwidth links? If the sites you are hitting are generally responding pretty quickly, and you have lots (more than say, 300) of sites in your file, it is not surprising that the CPU pegs. Could you be more specific about draining memory, ie, how much how fast? | [reply] |
|
|
I'm checking around 200 ip's. And when I look at my task manager in about 15 secs the process already has 400mb and it keeps on getting more. The cpu jumps directly to 100% when I run the script.
| [reply] |
|
|
The fact that your cpu usages jumps up is good. You generally want to use your cpu as much as possible, as that means you're not waiting on network traffic.
On the other hand:
1. Unless you're on Windows, you do not get 1 process, you'd get a process for each fork(), meaning 10 processes in this case.
2. Each child process will take a finite amount of memory. ForkMananger should keep the number of processes limited and memory should be reclaimed when the forked() child is exiting. Memory usage shouldn't increase indefinitely.
My guess: you're running on windows, and perl's fork() emulation is giving you trouble. It's possible that using threads may work a little better in that case.
| [reply] |
|
|
| [reply] |
Re: Parallel::ForkManager (high cpu and a lot of memory)
by BrowserUk (Patriarch) on Oct 08, 2008 at 10:57 UTC
|
Which version of threads are you using? Upgrading to 0.71 seems to avoid memory leaks that the combination of 5.10 and some earlier versions (eg. 0.67) exhibited.
There is no way that you should using 100% cpu with 10 threads performing IO. This seems to be a problem with Parallel::ForkManager on 5.10. You can do pretty much exactly the same thing as above, but using threads, like this:
#! perl -slw
use threads;
use threads::shared;
use LWP::UserAgent;
use HTTP::Request;
my $semStdout :shared;
my $running :shared = 0;
open(LIST,"urls.txt");
while ( my $tld = <LIST> ) {
chomp $tld;
Win32::Sleep( 100 ) while do{ lock $running; $running >= 10 };
async{
{ lock $running; ++$running; }
my $url = "http://$tld/";
my $ua = new LWP::UserAgent;
$ua->timeout(5);
$ua->agent("Mozilla/6.0");
my $req = HTTP::Request->new('GET',$url);
my $res = $ua->request($req);
my $content = $res->content;
my $status = $content =~ /OK/i ? 'ack' : 'nak';
{
lock $semStdout;
printf "(%3d)$tld: %s\n", threads->self->tid, $status;
}
{ lock $running; --$running; }
}->detach;
}
close(LIST);
Memory usage seems to be stable and cpu usage < 10% for 10 threads.
There are better, lower resource intensive ways of using threads, but it does have the virtue of being very close to the P::FM way of operating which you might consider a bonus.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
bytes
Compress::Raw::Zlib
Compress::Zlib
Fcntl
File::Glob
File::GlobMapper
File::Spec
File::Spec::Unix
HTML::Entities
HTML::HeadParser
HTML::Parser
IO
IO::Compress::Adapter::Deflate
IO::Compress::Base
IO::Compress::Base::Common
IO::Compress::Gzip
IO::Compress::Gzip::Constants
IO::Compress::RawDeflate
IO::Compress::Zlib::Extra
IO::File
IO::Handle
IO::Seekable
IO::Select
IO::Socket
IO::Socket::INET
IO::Socket::UNIX
IO::Uncompress::Adapter::Inflate
IO::Uncompress::Base
IO::Uncompress::Gunzip
IO::Uncompress::RawInflate
List::Util
LWP::Protocol::http
Net::HTTP
Net::HTTP::Methods
Scalar::Util
SelectSaver
Socket
Symbol
URI::_generic
URI::http
URI::_query
URI::_server
utf8
| [reply] [d/l] [select] |
|
|
Given that forks are threads on win32, and I see those same file access on my machine with my threaded code, there has to be something else going on that is consuming cpu, because the threaded version uses far less.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
|
|
Re: Parallel::ForkManager (high cpu and a lot of memory)
by perrin (Chancellor) on Oct 07, 2008 at 21:44 UTC
|
Are you on Windows? If so, you're really getting threads, and they're not sharing memory the way forked processes would. Try running it on unix if you can. | [reply] |
Re: Parallel::ForkManager (high cpu and a lot of memory)
by BrowserUk (Patriarch) on Oct 08, 2008 at 01:11 UTC
|
| [reply] |