Is it possible or practical to multi-thread these?

Possible: yes.

Practical: maybe.

Easy: no.

I'm assuming that the reason for the time taken to parse are the volumes of data being processed.

  1. If the bioperl data is something like FASTA format--ie. flat format, single records (even though spread across multiple lines)--then the problem is reasonable to tackle. A single thread of execution (TOE) reads the file and feeds a bunch of your object-stream parsers running in seperate TOEs. The results are forwarded to a final TOE that gathers them, optionally reordering & collating them and writes them to the output file(s).

    The TOEs can be either processes communicating through sockets or threads using either sockets or queues.

  2. For the XML, life is more awkward due to the hierarchical nature of XML. However, within the outer level of most xml documents, there are usually many repetitions of a smaller, self contained subtrees. It should be possible to hand these of to separate TOEs running whichever xml parser you favour and have them processed as discrete documents.

Going beyond that vague description requires more information on the scale and nature of the problem.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?

In reply to Re: Multithreading Parsers by BrowserUk
in thread Multithreading Parsers by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.