Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Perl threads to open 200 http connections

by robrt (Novice)
on Aug 02, 2010 at 06:48 UTC ( [id://852417]=perlquestion: print w/replies, xml ) Need Help??

robrt has asked for the wisdom of the Perl Monks concerning the following question:

Hey, I am using Perl 5.8.8. on Win XP. Im fairly new to this concept of Perl threads. As I am doing some network bandwidth testing, I am trying to open 200 HTTP connections to server and download 200 files(10MB size each) from server to client through http protocol. Here is the code for the same.

use threads;

for (my $i=0;$i<200;$i++) {
my $filename = "File-10M-".$i.".txt";
$thread = threads->new(\&sub1, $filename);
}

sub sub1 {
my $filename = shift;
system("wget --output-document D:\\kshare\\payload\\$filename http://10.2.1.23/http-path/payload/$filename >nul");
}

__END__

The problem is that, its unable to open more than 120 connections. I have tried $thread->join; $thread->detach as well, but the problem isnt solved. Could anyone please suggest me where im going wrong or any better way of doing the same in perl. I have also tried Forking, but forking breaks at the 64 connection itself. Thanks in advance!

 

 

  • Comment on Perl threads to open 200 http connections

Replies are listed 'Best First'.
Re: Perl threads to open 200 http connections
by BrowserUk (Patriarch) on Aug 02, 2010 at 07:24 UTC

    Change use threads; to use threads ( stack_size => 4096 ); See Use more threads. for the explanation.

    You might also consider using LWP::Simple::getstore( $url, $file ).

      Thanks! I changed the stack size to 4096, and now around 185 connections are getting established. I tried to lower the stack size to 1024 hoping to see the connections rise but it didn't happen.

      I tried to implement LWP::Simple but I noticed that it copies only the website matter(visible content on site), not the actual 10MB files.

      So, when I tried copy the files using the below code, this script is failing after 30 threads itself.
      use strict; use LWP::Simple; my $path = "http://10.2.1.23/http-path/payload/10MB/"; for (my $i=0;$i<200;$i++) { my $filename = "File-10M-".$i.".txt"; my $url = $path.$filename; my $file = "D:\\kshare\\payload\\$filename"; $thread = threads->new(\&httpcon, $url, $file); } $thread->join; sub httpcon { my $url = shift; my $file = shift; is_success(getstore($url, $file)) or die "$!\n"; }
      My aim is to fill the network pipe almost to 400 MB, but opening around 180 files is filling only around 70MB. Can anyone suggest me better way to fill the network bandwidth please.

        The first thing I notice is that there are things wrong with the code you've posted.

        You have use strict; but $thread is never declared?

        Also, you are attempting to start 200 threads, but only waiting for one to complete.

        Try this:

        #! perl -slw use strict; use threads ( stack_size => 4096 ); use threads::shared; use LWP::Simple; use Time::HiRes qw[ time sleep ]; our $T ||= 200; my $url = ### your url (of the actual file!) here ###; my $running :shared = 0; my $start = time; for( 1 .. $T ) { async( sub{ { lock $running; ++$running }; sleep 0.001 while $running < $T; my $id = shift; getstore( $url, qq[c:/test/dl.t.$id] ); --$running; }, $_ )->detach; } sleep 1 while $running; printf "Took %.3f seconds\n", time() - $start;

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Perl threads to open 200 http connections
by sundialsvc4 (Abbot) on Aug 02, 2010 at 16:19 UTC

    I would consider this “stress testing” scenario to be ill-advised and misleading.   If you attempt to launch hundreds of parallel threads, and unless each of them is really doing exactly what you are doing in production, the only thing that you are going to be “measuring” is the poor design of the test.

    First of all, you already know how big The Pipe is.   You know how many megabits or gigabits per second it can take.   Ballpark overhead to be in the area of 25% and figure that you can probably move the rest through the pipe as data ... assuming zero “hops.”

    Next, you can determine how many simultaneous transfers the computer can handle, by working systematically upward in small increments until you see the times begin to degrade exponentially.   This is the “elbow-shaped curve” that always exists; the so-called “thrash point.”   Again as a rule of thumb, step back 25% from that and call it good.

    The next part of your exploration should involve stochastic (statistical) modeling, which may or may not involve Perl.   (There are packages for the open-source analytics system, “R,” which are specifically designed for this.   See http://cran.r-project.org/ and search for “stochastic” or “simulation.”)

    (Heh... if you thought Perl was “engaging, addicting and fun” ...)     :-D

    You know that the request volume may at times exceed the number of worker-threads that are processing requests.   (An inbound request queue is, or should be, a basic part of the design.)   Therefore, you are interested to know what are the completion times of the requests, given that this time will include both processing time, I/O time, and time spent in the queue(s).

    It is most useful to approach this by estabishing goals, then measuring the system’s sustained ability to meet those goals.   For instance, you might stipulate that “95% of all requests must be serviced and returned to the client within 1.0 seconds.”   And you might stipulate that “the standard deviation of request times, which exceed the 1.0 second rule, must not exceed 2.00.”   Then you model the system and see if it passes or fails.   If it consistently fails, then you start looking for bottlenecks.

    Finally, always remember that what you are seeking to do here is “a thing that has already been done, countless times before.”   Take the time to thoroughly study prior art, and documented methods, before you start writing Perl (or any other) code.   I would predict with some certainty that you can, in fact, model the behavior of this system without writing any Perl code at all!

Re: Perl threads to open 200 http connections
by bluescreen (Friar) on Aug 02, 2010 at 13:10 UTC

    You're missing the $thread->join right after the for statement. That's mandatory otherwise the program exit without waiting threads to finish.

    In this case you're creating one thread + one proc for each wget session. One approach I'd try is to use Coro::LWP or AnyEvent::HTTP for non-blocking I/O, that should scale up better than the threaded approach

      One approach I'd try is to use Coro::LWP

      I'd like to try that. Perhaps you could post a sample?

      That's what I thought. I can't get the examples to work either.

Re: Perl threads to open 200 http connections
by afoken (Chancellor) on Aug 02, 2010 at 15:49 UTC

    You are creating 200 new instances of cmd.exe, they create 200 new instances of wget. I think you run out of memory or other resources. Also, hammering 200 connections against a single server is not very friendly. Unless you have a very good network connection, this will most likely saturate your network connection.

    forking breaks

    No, forking is not implemented. Perl on Windows has a pseudo-fork giving you a new interpreter thread for each fork. I think your process runs out of (interpreter) threads. I don't know how exactly exec() is implemented on Windows, but since the Windows API doesn't have exec(), it must be emulated using CreateProcess() and some code that waits for the spawned process to exit.

    I would like to see how "forking breaks". Show the code and the error from $!.

    The funny thing here is that you don't need more than a single process with a single thread to start 200 instances of wget. Just create all those processes in a loop, making sure not to wait for them until you need to. On Unix, you would just fork them and remember the PIDs, then wait until you saw all child processes exit by handling SIGCHLD. on Windows, you would use system(1,...) if you don't care about exiting before your children do, or Win32::Process::Create() (from Win32::Process) instead of fork, and poll Win32::Process::Wait() instead of handling SIGCHLD.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Hi Alex, after the 63 connections forking stops and following is the error. -- Cant Fork: Resource temporarily unavailable.

      Below is the code -
      for (my $i=0;$i<200;$i++) { my $filename = "File-10M-".$i.".txt"; FORK: { if ($pid=fork) { next; } elsif (defined $pid) { # system("wget --output-document D:\\kshare\\payload\\$filena +me http://10.2.1.23/http-path/payload/10MB/$filename >nul"); print "$filename\n"; exit; } elsif ($! eq "EAGAIN"){ redo FORK; } else { print "Cant Fork: $!"; exit 2; } } }

        The problem is the way wait is emulated. It uses a windows API that has a limit of 64 semaphores. Hence you cannot start another process until one of the existing 63 has completed.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Perl threads to open 200 http connections
by merlyn (Sage) on Aug 03, 2010 at 18:34 UTC
    Why are you trying to reinvent ab?

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://852417]
Approved by marto
Front-paged by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-25 10:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found