What's up with your node? You can use HTML for the most part, except use <code> to wrap code, so it can be formatted and/or extracted correctly.

It shouldn't be hard to keep the order that you walk the pages in, but we'll go with the threads here. You have a few problems to deal with:

Among others.

You could try an assembly-line thread pattern. Imagine thus:

use threads; use Thread::Queue; my $work_queue = Thread::Queue->new; $work_queue->enqueue($_) for ($start .. $end); #Fill our work queue my $work_queue = Thread::Queue->new; my $fetch_thread = threads->new( \&fetch, $work_queue, $fetched_queue +); sub fetch { my $input_queue = shift; my $output_queue = shift; while ( my $fetch_this = $input_queue->dequeue ) { #Get content, put in scalar $output_queue->enqueue($content); last if ($input_queue->pending == 0) } } my $processed_queue = Thread::Queue->new; my $process_thread = threads->new( \&process, $fetched_queue, $proces +sed_queue ); sub process { my $input_queue = shift; my $output_queue = shift; while ( my $process_this = $input_queue->dequeue ) { #process data, put in scalar $output_queue->enqueue($content); last if ($input_queue->pending == 0) } } while (my $processed_data = $processed_queue->dequeue) { #Assemble into final output last if ($input_queue->pending == 0) } #Make final output
I know this will need some adjustment to get exactly what you want, but you get the idea, right? (code above is pseudocode, missing much. may not even be sane, read sleep and gin disclaimer above.) I think a key part is passing the $work_queue into the fetch thread, so it can add to its own input queue.

Update:Also, "fork() em" is a play on "f__k em". That is to say, ignore the warnings, and continue on your quest, noble monk!

mhoward - at - hattmoward.org

In reply to Re^3: ithreads weren't the way.. still searching by meredith
in thread ithreads weren't the way.. still searching by hlen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.