perl5ever has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I am creating multiple threads to download urls with LWP. Is there any difference between using LWP before the worker threads are created versus requiring LWP in each worker thread, i.e.:
use LWP; use threads; sub worker_thread { ... }
versus
use threads; sub worker_thread { require LWP; ... }
If the OS is important, I am mainly interested in what will happen under Linux.

It seems that the first approach could be more efficient if the threads are created with copy-on-write. On the other hand, I've also seen comments like this:

require LWP::Simple; ## Requiring prevent CLONE leaks.
for instance in 759033. Is this a concern under Linux?

Replies are listed 'Best First'.
Re: common modules and threads?
by BrowserUk (Patriarch) on Feb 22, 2010 at 16:53 UTC

    Unfortunately, it is still a moving target. When I gave the require advice in the cited thread, I was using perl v5.10.0 (64-bit under Vista if memory serves), and it was necessary to require LWP::Simple to avoid leaks.

    I just tried the code from that thread on my current setup (perl v5.10.1 64-bit under Vista) and it dies the first time a thread terminates. Moving back to useing LWP::Simple and I get no traps and no memory leaks.

    I've limited knowledge of *nix, and I'm not sure if threads actually use COW there. I do know that COW is rarely very effective with Perl in the big picture because even read-only references to perl scalars can cause modifications to the internal representation of the SVs, forcing wholesale, but piecemeal copying of the memory.

    You touch one scalar in a 4096-byte of memory (Say: if( $scalar == 0 ) {), and if the scalar contains a PV (string) representation of a number, it must be converted to an NV or IV for the comparison. Then the whole page has to be copied. Sum a large array of numbers (loaded as strings from a file), and the array gets copied in a series of 4k chunks rather than copied as a single large entity in one pass. The result is slower, and can end up using more memory through fragmentation.

    Ultimately, with different platforms and different versions having different caveats, the best way is to simply try it for yourself.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: common modules and threads?
by zentara (Cardinal) on Feb 22, 2010 at 16:56 UTC
    Never rely on copy-on-write for sharing objects!

    The general rule I see, is try and keep as much of your object code in the individual threads as is possible. So I think putting the require LWP in the thread code block is the obvious way to go. How would you share an LWP object across threads anyways? Unless they all were not utilizing the main object at the same time, or had some sort of queue.

    Here is some code someone posted awhile back discussing the same thing. What you really are asking, is a question of the thread safety of the LWP module, which I have not seen questioned.... works well here. Never rely on the copy-on-write....you actually want to avoid it, because it causes thread safety problems.... like "free to wrong pool" errors.

    You can "create" all your LWP objects in main, and pass them as parameters to the thread creation sub, but why have all that LWP code in main, if it's being used in the threads anyways? So do it as follows.

    #!/usr/bin/perl use strict; use warnings; use LWP; use LWP::Simple; use threads; use Time::HiRes qw/gettimeofday/; sub HTTP_Req { my($tid,$host,$port,$uri)=@_; open my $FH, '>>', "test.csv" or die "$!"; my $req = new HTTP::Request 'POST'; my $url='http:' . '//' . $host . ':' . $port . $uri; my $ua = new LWP::UserAgent; $req->url($url); $req->header(Host => $host); $req->user_agent($ua->agent); $req->content_type('text/html'); my ($st_secs,$st_mins,$st_hours)=localtime(time); my ($seconds, $fraction) = gettimeofday(); $seconds = $seconds * 1000; my $st_ms = $fraction + $seconds; my $res = $ua->request($req); ($seconds, $fraction) = gettimeofday(); $seconds = $seconds * 1000; my $end_ms = $fraction + $seconds; my $resp_time=$end_ms - $st_ms; my ($end_secs,$end_mins,$end_hours)=localtime(time); my $resp_code=$res->code; if ($res->is_success) { print $FH "$tid,$st_hours:$st_mins:$st_secs,$end_hours:$end_mins:$end_secs,$st_m +s,$end_ms,$resp_time,$resp_code,SUCCESS\n"; } else { print $FH "$tid,$st_hours:$st_mins:$st_secs,$end_hours:$end_mins:$end_secs,$st_m +s,$end_ms,$resp_time,$resp_code,FAIL\n"; } close $FH; } my @threads; #my $host='domain.xyz.com'; my $host='http://google.com'; my $port='80'; my $uri='/'; my $threads=100; open my $FH, '>', "test.csv" or die "$!"; print $FH "tid,start_time,end_time,start Ms,End Ms,Response Time,ResponseCode,Result\n"; close $FH; for (1 .. 100) { my $thr = threads->create(\&HTTP_Req, $_,$host,$port,$uri); push(@threads,$thr); } $_->join foreach(@threads);

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku