2NetFly has asked for the wisdom of the Perl Monks concerning the following question:

Hi.
I have FreeBSD 4.9 and Perl 5.8.6 with useithreads=define installed.

I run my script (source is in the and of the message) and it works OK for 15-20 minutes. After that it slows down and soon freezes. I tried to execute script about 10 times and after 15-20 minutes all thread except 2 or 3 became freeze.

Tell me please how to solve this problem.
P.S. Sorry for my English.

Log example: 1;200; 4;200; 5;200; 10;200; 4;500; 13;200; 14;200; 10;200; 16;200; 18;200; 13;200; 26;200; 3;200; 28;200; 2;200; 7;200; 1;200; ... after 10 minutes 65;200; 65;200; Dump Dump 65;200; 42;200; Dump 42;200; Dump ^C Time:320 Done A thread exited while 101 threads were running. My code: #!/usr/bin/perl use strict; use threads; use threads::shared; use LWP::UserAgent; use HTTP::Request::Common; use Thread::Queue; $| = 1; my $thread_num : shared = 0; my $max_thread = 100; my $exit = 0; my $dump = 0; my $start_time = 0; my %tid : shared = (); my $result_q = Thread::Queue->new(); $SIG{INT} = sub { $exit++ }; $start_time = time(); threads->new(\&thread_do) for (1..$max_thread); while (!$exit) { for (my $i = 0; $i < $result_q->pending(); $i++) { print $result_q->dequeue(), "\n"; } if ($dump++ > 100000) { print "Dump\n"; dump_tid(\%tid); $dump = 0; } } print "\n\nTime:" . (time() - $start_time) . "\n"; print "Done\n"; sub thread_do { threads->self->detach(); my $tid = threads->self->tid(); while (1) { my $ua = LWP::UserAgent->new(timeout => 3); my $res = $ua->request(HEAD 'http://smth.com/'); $result_q->enqueue("$tid;" . $res->code() . ";"); lock %tid; $tid{$tid}++; } } sub dump_tid { my $tid = shift; open (DUMP, "> dump.txt"); print DUMP "$_ = $tid->{$_}\n" foreach keys %$tid; close DUMP; }

2004-12-29 Janitored by Arunbear - added readmore tags, as per Monastery guidelines

Replies are listed 'Best First'.
Re: Problem with ithreads
by BrowserUk (Patriarch) on Dec 30, 2004 at 02:23 UTC
    Tell me please how to solve this problem.

    Solve what problem?

    Your code does nothing useful, so it is impossible to say how to improve it.

    As for how long it takes to run, your figures seem about right to me.

    Running over 1000 threads all thrashing a single domain with HEAD requests looks like a DoS attack to me? Even if the domain your using for testing is your own, who are you practicing for?

    And if the point of your code is to see how many simultaneous thread you can run before you bog down your system then: Congratulations! You succeeded.

    If you have a real problem to solve, that doesn't sound like a DoS or another of blatent misuse, then by all means describe that problem and you might get some further help.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Problem with ithreads
by 2NetFly (Initiate) on Dec 30, 2004 at 07:48 UTC
    Thank you for answering my question.

    I have a lot of URLs and I need to check each of them if it was modified (with HEAD request) and GET its content if so. First I wrote the hole working code but it didn’t work correctly. Then I simplified my script to make code smaller and tried to localize the problem (the code I’ve posted above is simplified). But even this simple script didn’t work as I expected. As I said in first message, I run it and after 10-15 min something happens and the major number of threads freezes – they do nothing and put nothing into $result_q. I tried to decrease the number of threads and set $max_thread = 50 but result was the same. I tried to put the body of while loop into the eval block – the same result. The problem is that I don’t know the reason why threads stop after some time and how to prevent this. If you help me to solve this problem and to fix the code I’ve posted it would be easy for me to fix the main program.

    I read all docs about threads in perldoc, chapters about threads in Camelbook and Lincoln Stein’s book, messages in perl.ithreads, articles of Elizabeth Mattijsen and didn’t find anything about my problem. My native language is Russian, so I asked my question on most popular perl boards but no one answered me. So, The Perl Monks is my last hope.

    P.S. May be it also will help my to improve my awful English =)

      Your english is perfectly understandable--and a lot better than my Russian :)

      There are three problems with trying to help you:

      1. The code you posted, and your description above, do not show or describe what you are actually trying to do. I would need a description something like:
        1. Read nnn urls from a file.
        2. Issue a HEAD request for each url.
        3. Extract the modified date/time (how?)
        4. Compare this against (what?).
        5. If the page has been modified then
          1. Issue a GET request for the url.
          2. Save the page? content? to a file named?
        6. While the worker threads are running, the main thread will wait? display status? process the retrieved content?
      2. The first key rule to making effective use of iThreads is: "Only use a few!".

        Ithreads are relatively expensive to start and run and using more than 10 in any application is self-defeating. The time spent swapping between many threads negates most if not all of the benefits of using them.

        Using iThreads effectively, requires a different way of approaching the problem from either:

        • the fork approach exemplified by many *nix programs,
        • Or from the techniques used by C programs using "native threads" or most other forms of threading.

        Most of the documentation available for threading is not applicable or relevant for iThreading, and even that documentation available directly relating to iThreading is sadly lacking in depth and practical "How to..." advice.

        I've been trying to build up a body of practical examples that might form the basis of better documentation for a while, but the main problem is that all my experiments and programs are only applicable to Win32. Even when I have supplied example code to people to try on non-win32 platforms, I have never recieved any feedback as to whether it even works on their system. That makes drawing conclusions regarding the generality of the techniques I have developed almost impossible.

      3. IThreads are only beneficial if the data being accumulated or processed within the threads requires collating, merging or otherwise cross-referencing.

        Unless you are going to use the results of the thread processing in some way that makes it beneficial to share those results--ie. something more than just logging them--then you are almost certainly better off using forking to achieve concurrency--at least on a non-Win32 platform(where fork is implemented using threads).

      To summarise: Post your original code; and/or a full description of the problem you are trying to solve. I will then have a go at advising you on how best to tackle the problem with iThreads--or why iThreads are not applicable and advise what alternatives you might consider.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        The main algorithm may be described in few words:
        • There are one boss thread and number of worker threads.
        • Boss thread creates worker threads and they works parallel.
        • The boss thread generate tasks for worker threads (using Thread::Queue).
        • Each worker thread in while loop:
          • reads task parameters from Thread::Queue object;
          • execute task;
          • return result to the boss thread using another Thread::Queue object.
        • Boss thread receives result and does something with them.
        So each worker thread have to execute the same task with various parameters received from boss thread for many times and return results to the boss thread (boss <=> workers).

        So thread_do code may be:
        sub thread_do { threads->self->detach(); my $tid = threads->self->tid(); while (1) { my $url = $task_q->dequeue(); my $ua = LWP::UserAgent->new(timeout => 3); my $res = $ua->request(HEAD $url); $result_q->enqueue("$tid;$url;" . $res->code() . ";" . $res->m +essage() . ";"); } }
        As I said above, I run script and it works as I expected for the first 10-15 minutes. Every worker thread do one request per 3-4 seconds and put result into $result_q. The boss thread gets results from $result_q and prints them. So, if I run script with 50 threads I get about 10-15 results per seond:
        23;200;
        31;200;
        32;200;
        30;200;
        34;200;
        38;200;
        35;500;
        21;200;
        37;200;
        22;500;
        27;200;
        50;200;
        24;200;
        
        etc..

        Than something happens with almost all worker threads – they stops and I see results like this:
        30;200;
        7;200;
        30;200;
        7;200;
        7;200;
        7;200;
        30;200;
        7;200;
        I don’t know what happens with other 48 threads. Why they stop? This is the problem, I’m trying to solve and if I sole this problem I’ll be able to create script I need.

        The speed in first 10-20 minutes, while all threads are working, is perfect so all I need is to prevent worker threads from stopping after some time. This is the main problem.
Re: Problem with ithreads
by BrowserUk (Patriarch) on Dec 30, 2004 at 17:41 UTC

    Okay. Here are a few of the causes of your slowdown:

    1. Your main (boss) thread has a tight loop (while( !$exit ){) which means it spends most ot it's timeslices, and 90%+ of your cpu thashing around waiting for the threads to send it some results--which they can't do because your main thread is consuming all the cpu!

      Inserting a breif pause (select undef,undef,undef, 0.1;) into your main thread loop drops the overall cpu usage from 99% to 3-4%.

    2. You are creating a new user agent (my $ua = LWP::UserAgent->new(timeout => 3);) for every request you make.

      The simple expedient of moving that outside the loop and re-using the same user agent for each request made by the thread speeds the processing and reduces the memory consumption/thrashing enormously.

      I think this was the main cause of your slowdown.

    3. Your calculation of the number of threads required doesn't make sense.

      If the load imposed by running 20 threads uses all your cpu, adding another 80 into the mix will not help. Your code will just spend more time swapping and less time processing.

      The trick is always, start with a few threads, check that you aren't leaking memory or thrashing the cpu to death, and then increase the number of threads until adding more doesn't result in any greater throughput.

    With those few changes, I managed to process 2270 head requests in 97 seconds. And that is with my 40kb/s dial-up connection--using just 25 10* threads!

    * Update: Once I made the number of threads a command line parameter, I find that I get no discernable increase in throughput once I move above 10 threads. Despite that using 10 threads uses barely 10% of my cpu, the limitations on throughput seems to be soley the limited bandwidth of my connection. If you have a faster connection, you may be able to increase throughput by using more threads, but don't go mad. Start with 10 and increase in small jumps.

    [16:57:57.93] P:\test>418095 >nul Queued urls: 2270 Time:97 Done

    By my calculations, that means I should allow a throughput of 85,000 urls an hour--which I think well exceeds your requirements. With a little optimisation, this could probably be speeded up considerably.

    Note: This is done on Win32--I have no feel for what sort of results you will get under linux.

    I would be most grateful to hear what sort of throuput you get with what number of threads on your system please?

    Here the version of your code I used to get the above results:


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      Thank you very much!

      As you have noticed, the main problem was in while loop of the boss thread. When I added 0,1 second delay and ran script with 50 threads cpu was about 6-8% and load -- only 0.3-0.4. The speed was more than 2000 urls per minute. The speed with $max_thread = 100 was 3900 per minute, more than I have expected (cpu = 12-16%)!

      Many thanks for helping me!