Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to implement/adopt a "cache" for frequently accessed data which is currently loaded from and stored to files on each access/modification.

Situation

The data loaded from files is represented in hashes (one hash per file). The primary purpose of the "cache" would be to hide the reading/writing of files from the rest of the program, so that data retrieval and modifications stay reasonably fast even when there are high disk loads. The program is a forking server, so the "cache" needs to be shared across multiple processes.

Since I don't see a great overlap between traditional perl caches (CHI, Cache and similar) and my needs (essentially an in(-shared)-memory database with reading and writing entries to files on specific conditions), I don't think using those would be wise.

When trying to implement this "cache" I've considered using some sort of shared-memory module which could store a hash of hashes. On the hash of hashes in shared memory front, I've looked into IPC::MMA and IPC::Shareable but neither seems to fit the bill. IPC::MMA can only store scalars in its hash structures, so I can't nest the hashes. IPC::Shareable has the problem of possible conflicts with the 4 char glue (I need to share lots of relatively simple hashes) and might run out of usable shared memory segments.

I also looked at in-memory databases, but I'm not sure about how that would affect memory usage (I imagine anything in retrieved from a table will be copied), and all databases I've looked at would need a ramdisk, since they don't support in-memory connections from multiple processes.

Question

Primarily, I'd like to ask if you can recommend a perl module which supports nesting hashes in shared memory and doesn't suffer from the limitations IPC::Shareable has, however I'm open to suggestions of alternative approaches to solving my main issue (the "cache").

  • Comment on Sharing data "cache" between forked processes

Replies are listed 'Best First'.
Re: Sharing data "cache" between forked processes (MCE!)
by 1nickt (Canon) on Nov 23, 2018 at 14:37 UTC

    Hi, the correct solution depends on your specific needs, but marioroy's Perl Many-Core Engine offers several options. Please see MCE::Shared, MCE::Shared::Hash, MCE::Shared::Minidb, MCE::Shared::Cache.

    From what I understand from your post, you basically want a shared DB where individual keys can be handled as with a cache, but sub-keys can also be accessed. Presumably you also need to be able to search for a key or keys by the value(s) of a sub-key or sub-keys). You might like:

    use strict; use warnings; use feature 'say'; use Data::Dumper; use MCE::Shared; my $db = MCE::Shared->minidb(); my %hash = ( problem => 'foo', technique => 'blorgle', answer => 41 ); my %junk = ( problem => 'bla', technique => 'blargle' ); $db->hset( my_key => %hash ); $db->hset( junkey => %junk ); # sorry for the bad pun my $pid = fork; die 'Fork failed' if not defined $pid; if ( $pid == 0 ) { # child $db->happend( my_key => (problem => 'bar')); $db->hincr( my_key => 'answer'); $db->hset( my_key => (technique => 'frobnicate') ); exit; } # parent wait; my @rows = $db->select_href(':hashes', ':WHERE answer > 0'); say Dumper \@rows; __END__
    Output:
    $ perl monks/1226220.pl $VAR1 = [ [ 'my_key', { 'answer' => 42, 'problem' => 'foobar', 'technique' => 'frobnicate' } ] ];

    (A note from the doc that helps explain why the unfamiliar query syntax: "Several methods take a query string for an argument. The format of the string is described below. In the context of sharing, the query mechanism is beneficial for the shared-manager process. It is able to perform the query where the data resides versus the client-process grep locally involving lots of IPC.")

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Sharing data "cache" between forked processes
by hippo (Archbishop) on Nov 23, 2018 at 13:55 UTC
    all databases I've looked at would need a ramdisk, since they don't support in-memory connections from multiple processes.

    Doubtless I am misunderstanding here but does the memory engine of MariaDB not satisfy your requirements? I've used it for lower-latency stores in several projects (including FCGI-based access) without problems. Could you explain where this falls short for you?

Re: Sharing data "cache" between forked processes
by cavac (Prior) on Nov 23, 2018 at 12:52 UTC

    Really depends on your needs. But just to shamelessly plug my own stuff: Interprocess messaging with Net::Clacks

    Net::Clacks implements real time messaging as well as a memory-only cache. Basically, if you read a file, you could just store() it in Clacks as Base64. Structures could be encoded with JSON::XS + Base64. At least, that's how i'm doing it.

    If you want to handle the file loading/saving/deleting on the server side for some reason, it would be sort of trivial to implement. Just adding some flags to the OVERHEAD command handling in Net::Clacks::Server.pm should do the trick.

    perl -e 'use MIME::Base64; print decode_base64("4pmsIE5ldmVyIGdvbm5hIGdpdmUgeW91IHVwCiAgTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3duLi4uIOKZqwo=");'
Re: Sharing data "cache" between forked processes
by kschwab (Vicar) on Nov 23, 2018 at 13:53 UTC
Re: Sharing data "cache" between forked processes
by localshop (Monk) on Nov 26, 2018 at 05:43 UTC
    I've had good results using CHI but I'd also first look at Hippo's suggestion of solutions in the DB - almost all of them allow either pinning tables to memory or using an in-memory engine. You can even put the backend DB's on a RAM Drive, but if you're not interested in solving this at the persistence layer then I'd suggest looking at CHI as well as the other suggestions.

    CHI has been around a long time and although it hasn't seen much updating recently it's proven in many production environments - there's also some interest in porting it to Perl6 which I assume is a good thing.