tachyon has asked for the wisdom of the Perl Monks concerning the following question:

I am developing an app that is very DB intensive. By very I mean totally. I am currently caching all the static data I can. Although the DB queries are non trivial the return value I want is - it's always a simple integer. An ideal case for a hash table cache. A complicating factor is that there are about 50 parallel processes all doing the same thing to handle the load so the cache needs to be shared between processes, hence the desire to use shared memory to do it.

Not being one to reinvent the wheel I have looked at everything available on CPAN - basically Cache::Memory Cache::Memory::SharedMemory and the underlying IPC::ShareLite module.

The main problem with what is available is that the shared Memory module Cache::Memory::SharedMemory is glacial. While it works benchmarking shows it to be at 2-3 orders of magnitude slower than Cache::Memory. In fact it is way slower than raw DB access which of course defeats the purpose. If anyone knows of an existing FAST way to cache SQL query results and share that cache I am all ears.

Because of the issues with speed, unwanted data serialization, inefficient data expiry and no easy way to limit memory useage I have decided I probably need to roll my own module that does what I want - namely use a closed hash table to store hashed SQL along with the corresponding integer result. Naturally as speed is the issue this will essentially be in C with just enough Perl for the interface. To take advantage of CPU caching and place a finite limit on memory useage I plan to store the hash key, insertion time and data sequentially in memory. I also don't plan on having explicit linked list collision handling, simply letting new coliding data overwrite the old will be sufficient (the lost key value then has to come from the DB which is fine - I am storing the full hash for validation purposes). The trivial data size allows a wide (essentailly inefficient hash table) where memory is traded off for speed as always.

Anyway enough background. First I am unfamiliar with shared memory (but learning fast). What I would like to know is if I really have to lock access to memory locations while they are written given a data stuct width of (probably 12 bytes) (long ints for hash key, time, and data) and the potential consequences of not doing so. The app runs on Linux (so single threaded multi-process but on a multiprocessor machine) so it is theoretically possible that two processes could try to write/write write/read read/read to the same location ?literally simultaneously - at least as I understand it. Data corruption is tolerable to a degree as corrupting the hash key would effectively return a cache miss which is fine. Crashing the system is another issue. Any advice most welcome.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

  • Comment on System V shared Memory Perl Module (more of a C question than Perl) aka Efficient SQL result caching

Replies are listed 'Best First'.
Re: System V shared Memory Perl Module (more of a C question than Perl) aka Efficient SQL result caching
by perrin (Chancellor) on Apr 03, 2003 at 14:45 UTC
    Hang on there, you just picked the wrong module. Cache::FileCache is much faster than the shared memory version, and other are still faster than it.

    Take a look at these: IPC::MM, BerkeleyDB, Cache::Mmap, MLDBM::Sync (with SDBM_File).

Re: System V shared Memory Perl Module (more of a C question than Perl) aka Efficient SQL result caching
by Lhamo Latso (Scribe) on Apr 03, 2003 at 07:11 UTC

    If you are using Oracle, the results are already cached in the database's SGA.

    Could you be more specific about your environment. OS? DB? Also, post the SQL query if you can. Optimization of SQL is the biggest factor in tuning, along with correct indexing and memory usage.

      OS is Redhat 7.3. DB is MySQL because it has the fastest SELECT speed of any DB (so they say). Happy to use any DB - speed is the issue not rigid data integrity/rollbacks or finance per se. Berkeley won't work due to single reader/writer or multi reader config. Server has 2GB ram dual processors and all the RAID tweaks to minimize seek time. MySQL/OS compile tweaks to get around its ~800 query limit in stock form. Typical query is:

      SELECT integer from client_X WHERE key IN ( ?,?,?,? ) ORDER BY key DES +C

      Primary key is key. IN grouping 1 min, 10 max, average 3 typical 2-4.

      Peak query rate can potentially exceed 10,000 per second. Base code all copes fine. DB not so happy at high loads.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        I have the same sort of problem with an overloaded db doing queries where the results only change 5-10 times a day.

        I was thinking of only triggering my select queries when the cgi that updates the data is used. So my script would run off a static file and the file would only get updated when the data changed.

        You think I can use these memory module instead to get rid of the static file and store my results in memory.

        I'm in Mysql Apache as well and I am pretty sure that mysql has a query cache.