Is it possible or practical to multi-thread these?
Possible: yes.
Practical: maybe.
Easy: no.
I'm assuming that the reason for the time taken to parse are the volumes of data being processed.
- If the bioperl data is something like FASTA format--ie. flat format, single records (even though spread across multiple lines)--then the problem is reasonable to tackle. A single thread of execution (TOE) reads the file and feeds a bunch of your object-stream parsers running in seperate TOEs. The results are forwarded to a final TOE that gathers them, optionally reordering & collating them and writes them to the output file(s).
The TOEs can be either processes communicating through sockets or threads using either sockets or queues.
- For the XML, life is more awkward due to the hierarchical nature of XML. However, within the outer level of most xml documents, there are usually many repetitions of a smaller, self contained subtrees. It should be possible to hand these of to separate TOEs running whichever xml parser you favour and have them processed as discrete documents.
Going beyond that vague description requires more information on the scale and nature of the problem.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?