Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Multithreading Parsers

by inq123 (Sexton)
on Mar 22, 2005 at 08:56 UTC ( #441402=note: print w/replies, xml ) Need Help??

in reply to Multithreading Parsers

I totally agree with the suggestion to parallelize the input/output, not the algorithm.

For example, if your parser is parsing XML files in GAME format, then simply in your program glob all files needed to be parsed, split that array into 28 smaller arrays, then fork 27 children working on each of them. I do similar things all the time. Most of those times the output is also independent of each other, the only thing to worry about is competition for things like DB connections/transactions (but changing settings would help).

Or, say, you're parsing the whole Genbank release - even easier, just run 28 of your programs on 28 files! Very simple divide and conquer strategy :)

Trying to parallelize the algorithm, especially since you're using modules others developed, would not be a good investment of time when you could simply parallelize the input and achieve the same savings.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://441402]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2023-10-03 11:52 GMT
Find Nodes?
    Voting Booth?

    No recent polls found