While I see many style/design issues with the code
(shared globals, looping when you are only going to
do things once, the indentation issue I pointed out, etc)
the only issues that I see which
could cause things to fail horribly are the fact that
your call to wait on the children only takes place within
the child code so it is never being called, and you are
running on NT. (I should point out that the second issue is
not simple OS bigotry. Windows NT does not support a
native
fork, and the emulation has issues.) An
alternative method for starting parallel processes on NT
which has worked for me is
IPC::Open3. See
Run commands in parallel for a demonstration of how to do that. This
is less efficient than
forking on Unix, but it is
portable. (NT is, by design, much less friendly than Unix
to having multiple active processes trying to do work at
the same time. NT would prefer one process with multiple
threads, which Perl does not support very well.)
An incidental conceptual misunderstanding that I see is that
you are assuming that DOCUMENT_RETRIEVER will have a
useful return in the parent. It won't, but since you
don't use that it shouldn't be causing problems that you
see (yet). However what this means is that children and
parents will need to figure out how to communicate, and
the odds are pretty good that it will be through external
files.
And an incidental note. Most people who like to be called
things like "Perl guru" aren't. In general I have found
that people who think of themselves as being really good
do so because they have never been in the larger pond of
good people. But without that experience they have had
to invent things themselves, which means that they may be
better than their friends, but they are not going to be
very good next to a random person who has absorbed
"standard good advice".
And a final note. Parallel processing like this with many
processes works best when you are doing things where the
bottleneck is I/O. If you are doing computationally
intensive work, then it is preferable to run only as
many processes as you have CPUs. Because of this I
would suggest that you rethink your design. It is
probably going to make sense to have one loop where you
download your files in parallel, and then have another
loop where you do the complex processing serially.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.