There are at least two questions here.

  1. Will the DB make effective use of its multiple processor box.

    You'll need to consult the docs for your version of MySQL.

  2. Can you multi-thread the 'parse XML and feed the DB' end.

    As archfool and the documentation points out, it's not safe to use DBI from multiple threads for most flavours of DBI driver. However, it is safe to use DBI from within a multi-threaded Perl application, provided that you only interact with the DB from a single thread.

    So you could have (for example) several threads running fetching the XML data from the various sources and parsing it and then passing the processed data to a single DB thread for insertion. Whether that is an effective strategy will depend upon where the bottlenecks occur in your existing code.

    • If most of your runtime is spent in IO waits downloading, overlapping the downloads on different threads may be effective if you are downloading from multiple sources.
    • If a large proportion of the time is spent cpu-bound, parsing the XML, then parsing different documents on different threads should spread the cpu-bound parts of the processing across the multiple cpus. I say should because apparently some threaded OSs do not do this by default.
    • But if the bottleneck is uploading the data to the DB, making the above parts of the processing more efficient and then funnelling the data through a single thread for DB interaction could simply exacerbate the problem. In this case--multiple documents from multiple sources, there woudl be little advantage in using threads. Better to use multiple processes each operating on document(s) from different sources.
    • If your datasource is a single download of a single huge XML document that has to be downloaded as a single entity and parsed as a single entity it might be a bit harder to effectively use multiptocessing on that document. See below.

Basically, you'll need to profile your application to work out where the bottlenecks are. Armed with that information and a little more detail (like does GB of data come from: a single source as a single document; or multiple documents from a single source; or multiple documents from multiple sources.?), then it may be possible to see ways of using threads and/or processes to speed up your processing.

Even if this is one large XML document, it may contain repetitive sub-documents that could be easily recognised without recourse to a proper XML parser, or (maybe) using mirod's brilliant and very well supported XML::Twig, and split out from the document as it is being downloaded. You could then pass these on to other threads for further parsing before finally sending to a DB thread for upload to the DB.

Bottom line: more info required :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re: Threading - getting better use of my MP box by BrowserUk
in thread Threading - getting better use of my MP box by ethrbunny

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.