in reply to Re^6: Multithreaded xml writing using XML::Writer
in thread Multithreaded xml writing using XML::Writer
Then the first thing you need to determine, is whether performing your 3 queries in parallel actually improves the time taken to complete the overall task. And that is something that can only be determined empirically. Ie. Test it.
In theory, with the queries operating upon different databases, there should be no need for any internal locking or synchronisation. (Between these three queries--other client queries is a different matter.) So, if the server is multi-cored, then it is possible that you might save some time by overlapping them.
There is also the question of whether all the elements of the chain of software between you and the DB server are thread safe--DBI, DBD::MySQL; the C-libraries used by the DBD. Etc. At one time the answer was definitively "no". Things have moved on--certainly there have been some moves to make the Perl parts thread-safe--but I don't know what the current state of play is for MySQL.
But, assuming that you can demonstrate (to yourself), that there is something to be gained from running the queries in parallel, then comes the question of how to safely and efficiently combine those results into a single file using XML::Writer.
The problems here are:
Even if you used the threads::shared version of bless to create a shared object, the class has to be written specifically to anticipate and cater for sharing, and XML::Writer, in common with most CPAN modules, is not.
There is also a persuasive argument that says that shared objects do not work well in any language, even when specially constructed for the purpose. And that goes doubly so for objects that need to serialise access to a common resource--like a file.
So, assuming that you can successfully achieve gains by threading your queries, the question becomes how can you serialise the processing of the returns by a single XML::Writer object efficiently. And the answer to that will depend upon the nature of the data in the results set.
By which I mean that bulk data queries to DBI are usually returned as arrays of arrays, or arrays of hashes. And sharing nested data structures in non trivial and involves a lot of copying. Inevitably, the bigger the data structures, the more costly that becomes, and as you're considering threading, one assumes your's are pretty big.
Two methods for dealing with this present themselves:
Sharing structured data involves copying and is therefore costly, thereby potentially negating any gains through parallelising your queries--assuming there are any.
Is external locking of the cloned object sufficient to ensure safety--your original problem perhaps suggest not.
The upshot of all the above 'though experiments' is that you first need to test whether parallelising the queries buys you time.
And if it buys you enough to consider the additional complexity of threads, then you need to answer my earlier question about the ordering of data.
And then, assuming you're still considering this, explain the nature of the data returned and how it will be XMLised.
One final thought is that both mysql & mysqdump command line tools have --xml options, and they usually work much more quickly than perl scripts. It might be both simpler and quicker to use them to produce separate XML files (in parallel), and then combine the files by stripping redundant duplicate headers and top-level tags.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^8: Multithreaded xml writing using XML::Writer
by DreamT (Pilgrim) on May 04, 2010 at 07:07 UTC | |
by BrowserUk (Patriarch) on May 04, 2010 at 14:41 UTC |