The difference is "shared," not "threading."
With the (slight; and possibly implied) modification of s/"shared"/threads::shared/, we agree.
Though I disagree with "any sort of locking requirement would knock that advantage almost completely out".
Mutual exclusion to prevent concurrent access can be achieved far more cheaply than it is currently implemented in Perl. Sufficiently cheaply in the no-contention case, to allow locking to be almost costless for low-but-not-zero contention algorithms.
Indeed, in my experiments with lock-free and wait-free algorithms, I've found that "locking" algorithms can be more efficient than either, for many applications, where: a) the no-contention branch avoids dips into the kernel; b) the contend branch spins a little before deciding it needs to wait (and therefore dips into hte kernel).
Of course, that doesn't work so well for long-term locks, but that is an architectural problem rather than a fundamental algorithmic one.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] |
Thank you both for your feedback.
My program really is a re-vamp of something that was heavily string-procesing based and is well entrenched in Perl. Moving to something like C++ will definitely take some time and not likely to happen for a while. I have thought about using a relational database as a potentially viable option, so that may be revisited. I initially was thinking MySQL, but after some consulting, PostgreSQL was the choice. I'm glad to see that's your recommendation as well.
The use of the Bit::Vector library was primarily to reduce the memory requirement of holding these strings in memory and as a faster way to process these comparisons. Using some data juggling methods and heavy use of threads (w/o inter-thread communication), the first incarnation of my program was able to run to completion on a 1TB, 40-core machine in about 10 days, essentially using a whole bunch of strings, string ops, and hashes. Without the data juggling methods, I would exceed the 1TB of RAM. Using bit-vector encoding, I can reduce my memory consumption by ~32 fold on a 64-bit machine. And the bit vector operations are fast and do a lot of what I need it to do.
It is unfortunate that the bit vectors cannot be shared, however I do have work-arounds for this, although it is not as efficient as updating a shared variable. If updating a globally shared variable is too expensive (or not possible, in this case), I basically build up a thread-local variable and then dump it to disk with store() before joining the main thread. Kind of like the Parallel::ForkManager library does.
It is true that I have had to work through some (what I think were) contention issues when updating globally-shared variables, however for the most part, I've solved that. Most of my repeated ops on globally shared variables were reads anyway. Hopefully the repeate read attempts don't cause some sort of passive contention as Perl attempts to prevent concurrent access...
| [reply] [d/l] [select] |
Do you need to do this once or many times?
Either way, set this going with your current solution & data NOW! Once it is running, then explore further possibilities.
If: a) you are going to do this many times in the future; b) you can tolerate a delay (>1 month) before getting a good solution: c) have sufficient C knowledge to help me port and extend my XS bit-vector code to whatever platform you are running on; then email me. (See my home node.)
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] |