Avoid Locking Entire Hashes

jagan_1234 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Avoid Locking Entire Hashes by davido (Cardinal) on Jun 13, 2011 at 20:37 UTC
Even some DB's such as SQLite don't do "row locking" (as opposed to full table locks). On MySQL myisam tables don't row-lock, whereas innodb tables will. If it's becoming an issue, you may need to look at performing all of your shared-hash writes grouped together, in as few places in the script as possible to minimize the possibility of other threads blocking during a lock. It would be a similar philosophy to the old adage, "Print seldom, print late." If that's not cutting it, you could migrate to a database with row-locking support. Dave	[reply]
Re^2: Avoid Locking Entire Hashes by jagan_1234 (Sexton) on Jun 13, 2011 at 20:52 UTC
Thanks for your response. What I don't understand is why is to so hard to give a more fine-grained locking support? In some sense, "lock" in my example takes an address (in some shared memory space) and puts a semaphore around it. Why should it be so hard to put a semaphore around the address space corresponding to the address space of the value of $h{$key}? Intuitively speaking, lock($h) and lock($h{$key}} both seem so similar in spirit.	[reply]
Re: Avoid Locking Entire Hashes by ikegami (Patriarch) on Jun 13, 2011 at 21:06 UTC
The lock doesn't have to the be on the variable you are changing, so create a hash of mutexes. `my %h : shared; my %mutexes : shared; sub get_mutex { my ($k) = @_; my $mutex_ref = $mutexes{$k}; return $mutex_ref if $mutex_ref; lock($mutexes); my $new_mutex : shared; return $mutexes{$k} \|\|= \$new_mutex; } sub safe_set { my ($k, $v) = @_; lock ${ get_mutex($k) }; $h{$k} = $v; }` [download] Update: Fixed error mentioned in reply.	[reply] [d/l]
Re^2: Avoid Locking Entire Hashes by jagan_1234 (Sexton) on Jun 13, 2011 at 21:25 UTC
Thank you so much! That is a pretty nice solution, except of course it consumes a mutex variable per row. This drawback can be easily reduced to a smalller pool of mutexes rather than one per row. Thank again. BTW, I think the following line in your code is not thread safe; `my $mutex = $mutexes{$k} \|\|= do { my $mutex : shared; \$mutex };` [download] We may need something like this: `my $mutex = $mutexes{$k}; if (! defined $mutex) { lock($mutexes); if (!defined $mutexes{$k}) { my $new_mutex : shared; $mutexes{$k} = \$new_mutex $mutex = \$new_mutex } }` [download] This means that you have to lock the entire mutexes the first time you see it..	[reply] [d/l] [select]
Re^3: Avoid Locking Entire Hashes by BrowserUk (Patriarch) on Jun 14, 2011 at 06:13 UTC
That is a pretty nice solution, except of course it consumes ... ... an entire mirror data structure, completely unnecessarily. Why not store the values in your existing hash using references to shared scalars, and lock the individual scalars directly rather than via a proxy? `sub safe_set { my $k = shift; my $v :shared = shift; lock $$h{ $k }; $h{$k} = \$v; }` [download] The reasons why you can't lock individual hash elements are quite involved, but they boil down to the facts that: the values in a hash are not indexed directly by the keys themselves. And the keys aren't stored internally in scalars. the values in shared hashes aren't themselves shared scalars by default. A full explanation would probably require the original author to explain, but it probably comes down to the path of least resistance. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^3: Avoid Locking Entire Hashes by jagan_1234 (Sexton) on Jun 14, 2011 at 22:59 UTC
Re^4: Avoid Locking Entire Hashes by BrowserUk (Patriarch) on Jun 15, 2011 at 08:09 UTC
Re^4: Avoid Locking Entire Hashes by BrowserUk (Patriarch) on Jun 15, 2011 at 04:22 UTC
Some notes below your chosen depth have not been shown here
Re^4: Avoid Locking Entire Hashes by jagan_1234 (Sexton) on Jun 15, 2011 at 17:45 UTC
Re^5: Avoid Locking Entire Hashes by BrowserUk (Patriarch) on Jun 15, 2011 at 18:27 UTC
Some notes below your chosen depth have not been shown here
Re^5: Avoid Locking Entire Hashes by ikegami (Patriarch) on Jun 15, 2011 at 18:29 UTC
Re^3: Avoid Locking Entire Hashes by ikegami (Patriarch) on Jun 13, 2011 at 21:48 UTC
except of course it consumes a mutex variable per row. As requested. BTW, I think the following line in your code is not thread safe; oh, true! The fix is good, but the "defined" are superfluous.	[reply]
Re: Avoid Locking Entire Hashes by locked_user sundialsvc4 (Abbot) on Jun 14, 2011 at 16:10 UTC
I suggest that a simple mutex to control the entire hash structure is the only safe approach. If the hash table is a central “hot spot,” then perhaps what you really need to do is to reconsider the use of threads at all, or at the very least, to reconsider the roles of the various threads in the application. The technical reason is that... a hash is one great-big data structure, and you need to protect the whole thing. The practical reason is that, if you have a one-lane road on a superhighway, only one car can possibly go through at one time and it would be better to set-aside one thread to do that job, and maybe to just have one thread. Traffic-jams consume a tremendous amount of purely wasted time.