Re^2: Problem with ithreads

Your english is perfectly understandable--and a lot better than my Russian :)

There are three problems with trying to help you:

The code you posted, and your description above, do not show or describe what you are actually trying to do. I would need a description something like:
1. Read nnn urls from a file.
2. Issue a HEAD request for each url.
3. Extract the modified date/time (how?)
4. Compare this against (what?).
5. If the page has been modified then
  1. Issue a GET request for the url.
  2. Save the page? content? to a file named?
6. While the worker threads are running, the main thread will wait? display status? process the retrieved content?
The first key rule to making effective use of iThreads is: "Only use a few!".
Ithreads are relatively expensive to start and run and using more than 10 in any application is self-defeating. The time spent swapping between many threads negates most if not all of the benefits of using them.
Using iThreads effectively, requires a different way of approaching the problem from either:
- the fork approach exemplified by many *nix programs,
- Or from the techniques used by C programs using "native threads" or most other forms of threading.
Most of the documentation available for threading is not applicable or relevant for iThreading, and even that documentation available directly relating to iThreading is sadly lacking in depth and practical "How to..." advice.
I've been trying to build up a body of practical examples that might form the basis of better documentation for a while, but the main problem is that all my experiments and programs are only applicable to Win32. Even when I have supplied example code to people to try on non-win32 platforms, I have never recieved any feedback as to whether it even works on their system. That makes drawing conclusions regarding the generality of the techniques I have developed almost impossible.
IThreads are only beneficial if the data being accumulated or processed within the threads requires collating, merging or otherwise cross-referencing.
Unless you are going to use the results of the thread processing in some way that makes it beneficial to share those results--ie. something more than just logging them--then you are almost certainly better off using forking to achieve concurrency--at least on a non-Win32 platform(where fork is implemented using threads).

To summarise: Post your original code; and/or a full description of the problem you are trying to solve. I will then have a go at advising you on how best to tackle the problem with iThreads--or why iThreads are not applicable and advise what alternatives you might consider.

Examine what is said, not who speaks.

Silence betokens consent.

Love the truth but pardon error.

Comment on Re^2: Problem with ithreads

Replies are listed 'Best First'.
Re^3: Problem with ithreads by 2NetFly (Initiate) on Dec 30, 2004 at 10:15 UTC
The main algorithm may be described in few words: There are one boss thread and number of worker threads. Boss thread creates worker threads and they works parallel. The boss thread generate tasks for worker threads (using Thread::Queue). Each worker thread in while loop: reads task parameters from Thread::Queue object; execute task; return result to the boss thread using another Thread::Queue object. Boss thread receives result and does something with them. So each worker thread have to execute the same task with various parameters received from boss thread for many times and return results to the boss thread (boss <=> workers). So thread_do code may be: `sub thread_do { threads->self->detach(); my $tid = threads->self->tid(); while (1) { my $url = $task_q->dequeue(); my $ua = LWP::UserAgent->new(timeout => 3); my $res = $ua->request(HEAD $url); $result_q->enqueue("$tid;$url;" . $res->code() . ";" . $res->m +essage() . ";"); } }` [download] As I said above, I run script and it works as I expected for the first 10-15 minutes. Every worker thread do one request per 3-4 seconds and put result into $result_q. The boss thread gets results from $result_q and prints them. So, if I run script with 50 threads I get about 10-15 results per seond: 23;200; 31;200; 32;200; 30;200; 34;200; 38;200; 35;500; 21;200; 37;200; 22;500; 27;200; 50;200; 24;200; etc.. Than something happens with almost all worker threads – they stops and I see results like this: 30;200; 7;200; 30;200; 7;200; 7;200; 7;200; 30;200; 7;200; I don’t know what happens with other 48 threads. Why they stop? This is the problem, I’m trying to solve and if I sole this problem I’ll be able to create script I need. The speed in first 10-20 minutes, while all threads are working, is perfect so all I need is to prevent worker threads from stopping after some time. This is the main problem.	[reply] [d/l]
Re^4: Problem with ithreads by BrowserUk (Patriarch) on Dec 30, 2004 at 11:28 UTC
Why they stop? Because your running too many threads. And because your code is badly structured. This is an endless loop that does nothing useful. `while (1) { my $url = $task_q->dequeue(); my $ua = LWP::UserAgent->new(timeout => 3); my $res = $ua->request(HEAD $url); $result_q->enqueue( "$tid;$url;" . $res->code() . ";" . $res->message() . ";" ); }` [download] All your code does is prove that if you write badly structured code and run gazillions of threads, you can induce non-useful behaviour. This is the problem, I’m trying to solve... The solution is simple: Don't do that! Don't run large numbers of threads. Don't run endless, pointless loops within threads. All the code you have provided this time is a snippet of the same code from your original post, which you already said was test code--which I cannot correct, because it doesn't do anything useful. Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply] [d/l]
Re^5: Problem with ithreads by 2NetFly (Initiate) on Dec 30, 2004 at 14:15 UTC
I have 70 000 URLs (almost all on different servers) to check every day and I have to check them as quickly as possible (max 60 min). Depending on sever on witch document is located it take from 1 to 4 seconds to make head request. So, if I want make 70 000 / 60 / 60 = 20 requests per second, I need at least 50 threads working parallel. CPU load isn’t very high because almost all the time thread is waiting for response from the server. I’ll rewrite code a bit. `sub thread_do { # while not all urls are checked while (!$DONE) { # get new url from boss thread my $url = $task_q->dequeue(); # check url my $ua = LWP::UserAgent->new(timeout => 3); my $res = $ua->request(HEAD $url); # return result to the boss thread $result_q->enqueue( "$tid;$url;" . $res->code() . ";" . $res->message() . ";" ); }` [download] $DONE is a shared variable and the boss thread set it true when all urls are checked. I used prethreading that means that I create number of threads and each thread process the same task number of times. In my example each thread makes head request until all urls are checked.	[reply] [d/l]