in reply to ithreads weren't the way.. still searching

Forgive me if I'm way off here, because I don't think I'm getting the whole picture. Somewhere between sleepness and gin, my mind is going...

So you have to get a bunch of pages, in order, and timely. Why not just get all the pages to memory or temporary files, then do the processing pass? Threads can be a real pain to find a good way to pass complex data around, and you'll probably be uncomfortable with whatever solution you end with. (Some monks will come along and chastise you for using an unstable feature - fork() em, I say!) You might use Storable or YAML to pass data back in a scalar. Moving to passing data back in a scalar, you could use Threads::Queue to get the data back to the parent.

Update: sleeplessness!

mhoward - at - hattmoward.org
  • Comment on Re: ithreads weren't the way.. still searching

Replies are listed 'Best First'.
Re^2: ithreads weren't the way.. still searching
by hlen (Beadle) on Oct 01, 2004 at 03:54 UTC
    >So you have to get a bunch of pages, in order, and timely. Huh, no. They're not in order. I never know what's the next page, it comes up as a link in the fetched page. As I said, sequential is a must. >Why not just get all the pages to memory or temporary files, then do the processing pass? Well, that would turn `f p f p f p' into `f f f p p p'. Not sure if that'd help much, although it could, but it's not my main point, which is doing both things at once. >fork() em, I say! Not sure how.. seems like a perfect thread situation for me, although clearly not an ithread situation. Thanks

      What's up with your node? You can use HTML for the most part, except use <code> to wrap code, so it can be formatted and/or extracted correctly.

      It shouldn't be hard to keep the order that you walk the pages in, but we'll go with the threads here. You have a few problems to deal with:

      • How do you get the processed data back, in order, to the parent?
      • How do you know, after all processing threads have started, (and some may have already finished), when all are done, and ready for the next step.
      • How do you handle errors in a processing thread?
      Among others.

      You could try an assembly-line thread pattern. Imagine thus:

      use threads; use Thread::Queue; my $work_queue = Thread::Queue->new; $work_queue->enqueue($_) for ($start .. $end); #Fill our work queue my $work_queue = Thread::Queue->new; my $fetch_thread = threads->new( \&fetch, $work_queue, $fetched_queue +); sub fetch { my $input_queue = shift; my $output_queue = shift; while ( my $fetch_this = $input_queue->dequeue ) { #Get content, put in scalar $output_queue->enqueue($content); last if ($input_queue->pending == 0) } } my $processed_queue = Thread::Queue->new; my $process_thread = threads->new( \&process, $fetched_queue, $proces +sed_queue ); sub process { my $input_queue = shift; my $output_queue = shift; while ( my $process_this = $input_queue->dequeue ) { #process data, put in scalar $output_queue->enqueue($content); last if ($input_queue->pending == 0) } } while (my $processed_data = $processed_queue->dequeue) { #Assemble into final output last if ($input_queue->pending == 0) } #Make final output
      I know this will need some adjustment to get exactly what you want, but you get the idea, right? (code above is pseudocode, missing much. may not even be sane, read sleep and gin disclaimer above.) I think a key part is passing the $work_queue into the fetch thread, so it can add to its own input queue.

      Update:Also, "fork() em" is a play on "f__k em". That is to say, ignore the warnings, and continue on your quest, noble monk!

      mhoward - at - hattmoward.org