>So you have to get a bunch of pages, in order, and timely.
Huh, no. They're not in order. I never know what's the next
page, it comes up as a link in the fetched page. As I said,
sequential is a must.
>Why not just get all the pages to memory or temporary files, then do the processing pass?
Well, that would turn `f p f p f p' into `f f f p p p'.
Not sure if that'd help much, although it could, but it's
not my main point, which is doing both things at once.
>fork() em, I say!
Not sure how.. seems like a perfect thread situation for me, although clearly not an ithread situation.
Thanks | [reply] [d/l] [select] |
What's up with your node? You can use HTML for the most part, except use <code> to wrap code, so it can be formatted and/or extracted correctly.
It shouldn't be hard to keep the order that you walk the pages in, but we'll go with the threads here. You have a few problems to deal with:
- How do you get the processed data back, in order, to the parent?
- How do you know, after all processing threads have started, (and some may have already finished), when all are done, and ready for the next step.
- How do you handle errors in a processing thread?
Among others.
You could try an assembly-line thread pattern. Imagine thus:
use threads;
use Thread::Queue;
my $work_queue = Thread::Queue->new;
$work_queue->enqueue($_) for ($start .. $end); #Fill our work queue
my $work_queue = Thread::Queue->new;
my $fetch_thread = threads->new( \&fetch, $work_queue, $fetched_queue
+);
sub fetch {
my $input_queue = shift;
my $output_queue = shift;
while ( my $fetch_this = $input_queue->dequeue ) {
#Get content, put in scalar
$output_queue->enqueue($content);
last if ($input_queue->pending == 0)
}
}
my $processed_queue = Thread::Queue->new;
my $process_thread = threads->new( \&process, $fetched_queue, $proces
+sed_queue );
sub process {
my $input_queue = shift;
my $output_queue = shift;
while ( my $process_this = $input_queue->dequeue ) {
#process data, put in scalar
$output_queue->enqueue($content);
last if ($input_queue->pending == 0)
}
}
while (my $processed_data = $processed_queue->dequeue) {
#Assemble into final output
last if ($input_queue->pending == 0)
}
#Make final output
I know this will need some adjustment to get exactly what you want, but you get the idea, right? (code above is pseudocode, missing much. may not even be sane, read sleep and gin disclaimer above.) I think a key part is passing the $work_queue into the fetch thread, so it can add to its own input queue.
Update:Also, "fork() em" is a play on "f__k em". That is to say, ignore the warnings, and continue on your quest, noble monk!
mhoward - at - hattmoward.org
| [reply] [d/l] |