...ithreads functionality doesn't even come close to what it would take for this to work.
Pardon me, but poppycock!.
From the scant information supplied, you want to fetch a sequence of pages concurrently and then re-assemble them, in the original order.
Off the top of my head, I'd do something like this:
- Create 2 Thread::Queues
One to supply the "$seq_no:$url" to the threads,
One to return the fetched page "$seq_no:$contents" to the main thread.
- Start a number of threads that:
- Create their own user agents.
- Loop over the inputQ, waiting for a "$seq_no:$url", terminating when they dequeue undef.
- Split the seq_no & url.
- fetch the url.
- Prepend the sequence number to the contents and enqueue to the outputQ.
- loop till undef.
- Main thread enqueues the "$seq_no:$urls" to the inputQ.
- Main: waits for inputQ to empty.
- Main: enqueues 1 undef per thread.
- Main: Sort outputQ by the prepended seq_no into the correct order, splits off the sequence numbers and joins the contents.
- Processes the output.
There is plenty of scope in there for overlapping the appending and processing with the fetching. The main thread can dequeue the returns, process those that come out in the right order and store out-of-sequence returns in a hash for easy lookup. Each time it completes processing one set of content, it looks first in the hash to see if the next in sequence is available. If not, it goes back to dequeuing until it gets it.
With a little more ingenuity, the main thread could start another thread to do the processing that waits on a third Q. The main thread then dequeues and either re-queues to the processing thread or buffers in a hash.
The processing thread then performs the final disposal of the processed accumulated data, whilst the main thread blocks waiting for it to finish.
It's actually a very good use of threads and very straight forward to code.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.