in reply to Using threads to process multiple files
While the multithreaded code works, it's about 50% slower than the first one
The problem comes in two parts:
As track-to-track seek time is the slowest part of reading a disk; that means that it dominates; with the net result that you actually slow things down.
If the files are (can be arranged to be without necessitating moving one of them), on different physical devices, much of the additional overhead is negated. If those drives are connected by separate interfaces so much the better.
return \%hash;
Normally, without threads, that is a very efficient operation necessitating just the copying of a reference.
But with the threads memory model; only data that is explicitly shared can be transferred between threads. Normally, this is a good thing preventing unintended shared accesses and all the problems that can arise from them; but in this case, it means that the entire hash gets effectively duplicated; and thus is a relatively slow process for large hashes.
This is especially annoying when the transfer is the last act of the originating thread; as once the original hash has been duplicated, it is then discarded; so there would be no risk from just transferring the reference.
I did once look into whether threads could be patched to avoid the duplication for this special case; but the entire memory management of perl; and especially under threading; is so complex and opaque that I quickly gave up.
There is a possible, but nascent, unproven and unpublished possible solution to the latter part of the problem. I've recently written several hash-like data-structures using Inline::C that bypass Perl's memory management entirely and allocate their memory from the CRT heap. As all that perl sees of these structures is an opaque RV pointing to a UV, it should be possible to pass one of these references between threads without Perl interfering and needing to duplicate them.
But, the ones that would be applicable to your use were only developed as far as needed to prove that they weren't useful for my purposes and then abandoned whilst I developed my solution which whilst hash-like; is very specialised for my requirements and not useful as a general purpose hash (no iterators or deletions); and I don;t have the time to finish any of the more general ones.
If you have C/XS skills and your need is pressing enough, I could give you what I have for you to finish.
Of course, that would only help if you can arrange for your two files to be on different disks in order to solve or mitigate part 1 of the problem.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Using threads to process multiple files
by kennethk (Abbot) on Jan 30, 2015 at 19:27 UTC | |
by RichardK (Parson) on Jan 31, 2015 at 00:16 UTC | |
|
Re^2: Using threads to process multiple files
by anli_ (Novice) on Feb 02, 2015 at 09:00 UTC | |
by BrowserUk (Patriarch) on Feb 02, 2015 at 16:08 UTC | |
by BrowserUk (Patriarch) on Feb 02, 2015 at 19:59 UTC |