jagan_1234 has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a multithreaded program (say, large number of threads just to make the problem dramatic) where I need multiple threads to change hash values of hash keys in an atomic fashion. I understand that I can use shared variables and lock etc, but is there a way to lock just one key instead of the entire hash structure? It seems odd that there is not a straight forward way to lock just the key I am operating on, but have to lock the entire structure. What am I missing here?

our %h:shared; sub lock { my ($key, $val) = @_; # critical section { lock($h); # don't like this part, I want to lock($h{$key}); $h{$key} = $val; } }

Replies are listed 'Best First'.
Re: Avoid Locking Entire Hashes
by davido (Cardinal) on Jun 13, 2011 at 20:37 UTC

    Even some DB's such as SQLite don't do "row locking" (as opposed to full table locks). On MySQL myisam tables don't row-lock, whereas innodb tables will. If it's becoming an issue, you may need to look at performing all of your shared-hash writes grouped together, in as few places in the script as possible to minimize the possibility of other threads blocking during a lock. It would be a similar philosophy to the old adage, "Print seldom, print late." If that's not cutting it, you could migrate to a database with row-locking support.


    Dave

      Thanks for your response. What I don't understand is why is to so hard to give a more fine-grained locking support? In some sense, "lock" in my example takes an address (in some shared memory space) and puts a semaphore around it. Why should it be so hard to put a semaphore around the address space corresponding to the address space of the value of $h{$key}? Intuitively speaking, lock($h) and lock($h{$key}} both seem so similar in spirit.

Re: Avoid Locking Entire Hashes
by ikegami (Patriarch) on Jun 13, 2011 at 21:06 UTC

    The lock doesn't have to the be on the variable you are changing, so create a hash of mutexes.

    my %h : shared; my %mutexes : shared; sub get_mutex { my ($k) = @_; my $mutex_ref = $mutexes{$k}; return $mutex_ref if $mutex_ref; lock($mutexes); my $new_mutex : shared; return $mutexes{$k} ||= \$new_mutex; } sub safe_set { my ($k, $v) = @_; lock ${ get_mutex($k) }; $h{$k} = $v; }

    Update: Fixed error mentioned in reply.

      Thank you so much! That is a pretty nice solution, except of course it consumes a mutex variable per row. This drawback can be easily reduced to a smalller pool of mutexes rather than one per row. Thank again. BTW, I think the following line in your code is not thread safe;
      my $mutex = $mutexes{$k} ||= do { my $mutex : shared; \$mutex };
      We may need something like this:
      my $mutex = $mutexes{$k}; if (! defined $mutex) { lock($mutexes); if (!defined $mutexes{$k}) { my $new_mutex : shared; $mutexes{$k} = \$new_mutex $mutex = \$new_mutex } }
      This means that you have to lock the entire mutexes the first time you see it..
        That is a pretty nice solution, except of course it consumes ...

        ... an entire mirror data structure, completely unnecessarily.

        Why not store the values in your existing hash using references to shared scalars, and lock the individual scalars directly rather than via a proxy?

        sub safe_set { my $k = shift; my $v :shared = shift; lock $$h{ $k }; $h{$k} = \$v; }

        The reasons why you can't lock individual hash elements are quite involved, but they boil down to the facts that:

        • the values in a hash are not indexed directly by the keys themselves. And the keys aren't stored internally in scalars.
        • the values in shared hashes aren't themselves shared scalars by default.

        A full explanation would probably require the original author to explain, but it probably comes down to the path of least resistance.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        except of course it consumes a mutex variable per row.

        As requested.

        BTW, I think the following line in your code is not thread safe;

        oh, true! The fix is good, but the "defined" are superfluous.

Re: Avoid Locking Entire Hashes
by locked_user sundialsvc4 (Abbot) on Jun 14, 2011 at 16:10 UTC

    I suggest that a simple mutex to control the entire hash structure is the only safe approach.   If the hash table is a central “hot spot,” then perhaps what you really need to do is to reconsider the use of threads at all, or at the very least, to reconsider the roles of the various threads in the application.

    The technical reason is that... a hash is one great-big data structure, and you need to protect the whole thing.   The practical reason is that, if you have a one-lane road on a superhighway, only one car can possibly go through at one time and it would be better to set-aside one thread to do that job, and maybe to just have one thread.   Traffic-jams consume a tremendous amount of purely wasted time.