in reply to Parallel downloading under Win32?

  • under Win32 AND Linux
  • without having it crash due to too many leaked scalars
  • without using a GB of ram
  • without being incredibly slow when downloading
  1. Should run anywhere Perl+threads do.

    (I don't have Linux!)

  2. No scalars leaked on my system.
  3. Uses 50MB for 4 concurrent threads.
  4. 16 files - 46,617,229 bytes - 181 seconds - 257 KB/s.

    (The maximum throughput of my connection: 2496 kbps.)

#! perl -sw use 5.010; use strict; use threads ( stack_size => 0 );; use Thread::Queue; sub thread { my $tid = threads->tid; require LWP::Simple; my( $Q, $dir ) = @_; while( my $url = $Q->dequeue ) { my( $file ) = $url =~ m[/([^/]+)$]; my $status = LWP::Simple::getstore( $url, "$dir/$file" ); printf STDERR "[$tid] $url => $dir/$file: $status\n"; } } our $T ||= 4; our $DIR ||= '.'; say scalar localtime; my $Q = new Thread::Queue; my @threads = map threads->create( \&thread, $Q, $DIR ), 1 .. $T; chomp, $Q->enqueue( $_ ) while <>; $Q->enqueue( (undef) x $T ); $_->join for @threads; say scalar localtime;

Console log from test session:

__END__ C:\test\tmp>dir Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of C:\test\tmp 29/04/2009 20:13 <DIR> . 29/04/2009 20:13 <DIR> .. 29/04/2009 20:13 904 urls.txt 1 File(s) 904 bytes 2 Dir(s) 436,394,737,664 bytes free C:\test\tmp>..\pget urls.txt Wed Apr 29 20:13:51 2009 [1] http://extensions.services.openoffice.org/files/2318/4/as_IN.oxt = +> ./as_IN.oxt: 200 [1] http://extensions.services.openoffice.org/files/2318/4/as_IN.oxt = +> ./as_IN.oxt: 200 [3] http://wordpress.org/latest.zip => ./latest.zip: 200 [3] http://extensions.services.openoffice.org/files/2318/4/as_IN.oxt = +> ./as_IN.oxt: 200 [1] http://wordpress.org/latest.zip => ./latest.zip: 200 [1] http://uk2.php.net/distributions/php-debug-pack-5.2.9-2-Win32.zip +=> ./php-debug-pack-5.2.9-2-Win32.zip: 200 [3] http://wordpress.org/latest.zip => ./latest.zip: 200 [1] http://extensions.services.openoffice.org/files/2318/4/as_IN.oxt = +> ./as_IN.oxt: 200 [2] http://uk2.php.net/distributions/php-debug-pack-5.2.9-2-Win32.zip +=> ./php-debug-pack-5.2.9-2-Win32.zip: 200 [2] http://extensions.services.openoffice.org/files/2318/4/as_IN.oxt = +> ./as_IN.oxt: 200 [1] http://uk2.php.net/distributions/php-debug-pack-5.2.9-2-Win32.zip +=> ./php-debug-pack-5.2.9-2-Win32.zip: 200 [3] http://wordpress.org/latest.zip => ./latest.zip: 200 [2] http://uk2.php.net/distributions/php-debug-pack-5.2.9-2-Win32.zip +=> ./php-debug-pack-5.2.9-2-Win32.zip: 200 [1] http://wordpress.org/latest.zip => ./latest.zip: 200 [4] http://search.cpan.org/CPAN/authors/id/N/NW/NWCLARK/perl-5.8.9.tar +.bz2 => ./perl-5.8.9.tar.bz2: 200 [3] http://uk2.php.net/distributions/php-debug-pack-5.2.9-2-Win32.zip +=> ./php-debug-pack-5.2.9-2-Win32.zip: 200 Wed Apr 29 20:17:01 2009 C:\test\tmp>dir Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of C:\test\tmp 29/04/2009 20:13 <DIR> . 29/04/2009 20:13 <DIR> .. 29/04/2009 20:15 96,501 as_IN.oxt 29/04/2009 20:16 1,853,086 latest.zip 29/04/2009 20:16 11,121,414 perl-5.8.9.tar.bz2 29/04/2009 20:17 5,149,576 php-debug-pack-5.2.9-2-Win32.zip 29/04/2009 20:13 904 urls.txt 5 File(s) 18,221,481 bytes 2 Dir(s) 436,377,415,680 bytes free

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Parallel downloading under Win32?
by Xenofur (Monk) on Apr 29, 2009 at 20:05 UTC
    I would like to test it, but for that I'd need to insert it into my module. This endeavour in turn is hampered by the fact that i just plain cannot tell what's going on after: my $Q = new Thread::Queue;

    Seriously, it looks like you wrote that with the intent to make it as unreadable as possible.

      It's not actually hard. The system has threads that are fed off a Thread::Queue. Each thread takes a job from the queue, performs it, then takes the next one from the queue. The map just creates $T threads, and to tell each thread that it is finished, it sticks $T undef elements at the end of the queue. Then the main thread waits that all threads finish their work. That's all there is to it.

        Thanks for the explanations, they helped me a lot in understanding it. :)
      Seriously, it looks like you wrote that with the intent to make it as unreadable as possible.

      Is that a request for clarification?

      Suggestion: Run it standalone as posted first, to convince yourself that it actually works on your system.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Oh, i had no doubt that it worked. I had trouble understanding *how* it worked. I write my perl in a very declarative and verbose manner, have never had reason to use map before, didn't know you could string commands together with commas to act on $_ without wrapping it in braces and didn't know why you were pushing undefs into the array.

        In short: The syntax and lack of any explanation completely stumped me.

        Either way, i have to admit that it is a superior solution to the wget method, as long as enough ram is available. Getting it to run enough threads to run at comparable speed to the wget method required 300 mb. However, due to the fact that it actually is possible to keep control of the ram use and that it runs entirely with Perl modules it is the better solution.

        As such, thanks a lot. :)

        FWIF, this is how i'm using it now: