in reply to Problem with ithreads

Thank you for answering my question.

I have a lot of URLs and I need to check each of them if it was modified (with HEAD request) and GET its content if so. First I wrote the hole working code but it didn’t work correctly. Then I simplified my script to make code smaller and tried to localize the problem (the code I’ve posted above is simplified). But even this simple script didn’t work as I expected. As I said in first message, I run it and after 10-15 min something happens and the major number of threads freezes – they do nothing and put nothing into $result_q. I tried to decrease the number of threads and set $max_thread = 50 but result was the same. I tried to put the body of while loop into the eval block – the same result. The problem is that I don’t know the reason why threads stop after some time and how to prevent this. If you help me to solve this problem and to fix the code I’ve posted it would be easy for me to fix the main program.

I read all docs about threads in perldoc, chapters about threads in Camelbook and Lincoln Stein’s book, messages in perl.ithreads, articles of Elizabeth Mattijsen and didn’t find anything about my problem. My native language is Russian, so I asked my question on most popular perl boards but no one answered me. So, The Perl Monks is my last hope.

P.S. May be it also will help my to improve my awful English =)

Replies are listed 'Best First'.
Re^2: Problem with ithreads
by BrowserUk (Patriarch) on Dec 30, 2004 at 08:38 UTC

    Your english is perfectly understandable--and a lot better than my Russian :)

    There are three problems with trying to help you:

    1. The code you posted, and your description above, do not show or describe what you are actually trying to do. I would need a description something like:
      1. Read nnn urls from a file.
      2. Issue a HEAD request for each url.
      3. Extract the modified date/time (how?)
      4. Compare this against (what?).
      5. If the page has been modified then
        1. Issue a GET request for the url.
        2. Save the page? content? to a file named?
      6. While the worker threads are running, the main thread will wait? display status? process the retrieved content?
    2. The first key rule to making effective use of iThreads is: "Only use a few!".

      Ithreads are relatively expensive to start and run and using more than 10 in any application is self-defeating. The time spent swapping between many threads negates most if not all of the benefits of using them.

      Using iThreads effectively, requires a different way of approaching the problem from either:

      • the fork approach exemplified by many *nix programs,
      • Or from the techniques used by C programs using "native threads" or most other forms of threading.

      Most of the documentation available for threading is not applicable or relevant for iThreading, and even that documentation available directly relating to iThreading is sadly lacking in depth and practical "How to..." advice.

      I've been trying to build up a body of practical examples that might form the basis of better documentation for a while, but the main problem is that all my experiments and programs are only applicable to Win32. Even when I have supplied example code to people to try on non-win32 platforms, I have never recieved any feedback as to whether it even works on their system. That makes drawing conclusions regarding the generality of the techniques I have developed almost impossible.

    3. IThreads are only beneficial if the data being accumulated or processed within the threads requires collating, merging or otherwise cross-referencing.

      Unless you are going to use the results of the thread processing in some way that makes it beneficial to share those results--ie. something more than just logging them--then you are almost certainly better off using forking to achieve concurrency--at least on a non-Win32 platform(where fork is implemented using threads).

    To summarise: Post your original code; and/or a full description of the problem you are trying to solve. I will then have a go at advising you on how best to tackle the problem with iThreads--or why iThreads are not applicable and advise what alternatives you might consider.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      The main algorithm may be described in few words:
      • There are one boss thread and number of worker threads.
      • Boss thread creates worker threads and they works parallel.
      • The boss thread generate tasks for worker threads (using Thread::Queue).
      • Each worker thread in while loop:
        • reads task parameters from Thread::Queue object;
        • execute task;
        • return result to the boss thread using another Thread::Queue object.
      • Boss thread receives result and does something with them.
      So each worker thread have to execute the same task with various parameters received from boss thread for many times and return results to the boss thread (boss <=> workers).

      So thread_do code may be:
      sub thread_do { threads->self->detach(); my $tid = threads->self->tid(); while (1) { my $url = $task_q->dequeue(); my $ua = LWP::UserAgent->new(timeout => 3); my $res = $ua->request(HEAD $url); $result_q->enqueue("$tid;$url;" . $res->code() . ";" . $res->m +essage() . ";"); } }
      As I said above, I run script and it works as I expected for the first 10-15 minutes. Every worker thread do one request per 3-4 seconds and put result into $result_q. The boss thread gets results from $result_q and prints them. So, if I run script with 50 threads I get about 10-15 results per seond:
      23;200;
      31;200;
      32;200;
      30;200;
      34;200;
      38;200;
      35;500;
      21;200;
      37;200;
      22;500;
      27;200;
      50;200;
      24;200;
      
      etc..

      Than something happens with almost all worker threads – they stops and I see results like this:
      30;200;
      7;200;
      30;200;
      7;200;
      7;200;
      7;200;
      30;200;
      7;200;
      I don’t know what happens with other 48 threads. Why they stop? This is the problem, I’m trying to solve and if I sole this problem I’ll be able to create script I need.

      The speed in first 10-20 minutes, while all threads are working, is perfect so all I need is to prevent worker threads from stopping after some time. This is the main problem.
        Why they stop?

        Because your running too many threads.

        And because your code is badly structured. This is an endless loop that does nothing useful.

        while (1) { my $url = $task_q->dequeue(); my $ua = LWP::UserAgent->new(timeout => 3); my $res = $ua->request(HEAD $url); $result_q->enqueue( "$tid;$url;" . $res->code() . ";" . $res->message() . ";" ); }

        All your code does is prove that if you write badly structured code and run gazillions of threads, you can induce non-useful behaviour.

        This is the problem, I’m trying to solve...

        The solution is simple: Don't do that! Don't run large numbers of threads. Don't run endless, pointless loops within threads.

        All the code you have provided this time is a snippet of the same code from your original post, which you already said was test code--which I cannot correct, because it doesn't do anything useful.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.