in reply to ithreads weren't the way.. still searching

...ithreads functionality doesn't even come close to what it would take for this to work.

Pardon me, but poppycock!.

From the scant information supplied, you want to fetch a sequence of pages concurrently and then re-assemble them, in the original order.

Off the top of my head, I'd do something like this:

  1. Create 2 Thread::Queues

    One to supply the "$seq_no:$url" to the threads,

    One to return the fetched page "$seq_no:$contents" to the main thread.

  2. Start a number of threads that:
    1. Create their own user agents.
    2. Loop over the inputQ, waiting for a "$seq_no:$url", terminating when they dequeue undef.
    3. Split the seq_no & url.
    4. fetch the url.
    5. Prepend the sequence number to the contents and enqueue to the outputQ.
    6. loop till undef.
  3. Main thread enqueues the "$seq_no:$urls" to the inputQ.
  4. Main: waits for inputQ to empty.
  5. Main: enqueues 1 undef per thread.
  6. Main: Sort outputQ by the prepended seq_no into the correct order, splits off the sequence numbers and joins the contents.
  7. Processes the output.

There is plenty of scope in there for overlapping the appending and processing with the fetching. The main thread can dequeue the returns, process those that come out in the right order and store out-of-sequence returns in a hash for easy lookup. Each time it completes processing one set of content, it looks first in the hash to see if the next in sequence is available. If not, it goes back to dequeuing until it gets it.

With a little more ingenuity, the main thread could start another thread to do the processing that waits on a third Q. The main thread then dequeues and either re-queues to the processing thread or buffers in a hash.

The processing thread then performs the final disposal of the processed accumulated data, whilst the main thread blocks waiting for it to finish.

It's actually a very good use of threads and very straight forward to code.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
  • Comment on Re: ithreads weren't the way.. still searching

Replies are listed 'Best First'.
Re^2: ithreads weren't the way.. still searching
by hlen (Beadle) on Oct 01, 2004 at 05:16 UTC
    Your post was clarifying, and I admit I do not have a great expertise with ithreads (though you seemed to miss the fact that the pages need to be gotten sequentially). The limitations I found that killed the act, at first, were not being able to use shared blessed objects, or classes, for that matter, even because unshared referents are trouble for ithreads. So how could a thread which does the processing call $root->push_content? Your post has given answers to those questions, which I'll study. I think a main thread holding $root and a queue of processed elements to be pushed should be best. Thanks a lot.
Re^2: ithreads weren't the way.. still searching
by meredith (Friar) on Oct 01, 2004 at 04:25 UTC

    You're right! It is straightforward. I guess I should have waited for others to make their replies before typing all that. =)

    mhoward - at - hattmoward.org