Ytrew has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I've got an optimization problem to solve, and I'm trying to decide on the best approach to solving it. Probably, someone on PerlMonks already knows the answer.

We've got an existing system which loads data by doing individual SQL queries for each record read, and caching results. We have multiple processors on our HP/UX system, so we take advantage of this by running multiple copies of our program, but the results are still far too slow.

I'd like to replace the current implementation with a system that keeps all the configuration data in memory: similar to the database server, but more efficient, and simpler.

To avoid wasting RAM, I'd like to keep a single copy of all the configuration data in RAM, and let every process that needs to read from it.

Three approaches suggest themselves to me: some sort of "ramdisk" type approach, SysV style shared-memory, and memory-mapped files. I haven't worked with any of these on HP/UX, so I'm unsure of their relative merits. I know that SysV shared memory and memory-mapped files are available, but don't know if a ramdisk is a possibility. I do remember that ramdisks under Linux are (or were?) a kernel option that the system administrator needed to configure. I don't administrate the target system, and I don't think re-building a kernel configuration for our production system would be welcomed for our project. So a ramdisk may not be an option, unless someone knows how to configure one under HP/UX without a kernel re-build.

As for mmap and shared memory options, both seem to have some overhead that I don't really need. I specifically don't need to do any tricky inter-process communication: I just want a common section of memory that can be read quickly.

Shared memory under perl seems to copy the data: is this true under C? Specifically, can I just write an XS module to search quickly through the shared memory section, extract the values I want, and return them to perl? Is there a CPAN module that I've overlooked which does this sort of thing?

Memory mapping (via the mmap() system call)seems tied to a file, which may or may not be what I want. It seems to require an open() call and a mmap() call for each filehandle: does this imply that only processes that can share filehandles can share memory in this way? Or will open/mmap detect that the file has been mmapped already?

I don't ever need to write changes back to the file: will loading the file as read-only optimize for this, or will I pay the overhead of the unused write functionality anyway?Is it even significant?

Any suggestions or specific details on how shared memory and/or mmap() work would be appreciated.
--
Ytrew Q. Uiop

Replies are listed 'Best First'.
Re: How best to optimize shared data
by dragonchild (Archbishop) on Feb 04, 2005 at 14:58 UTC
    We've got an existing system which loads data by doing individual SQL queries for each record read, and caching results. We have multiple processors on our HP/UX system, so we take advantage of this by running multiple copies of our program, but the results are still far too slow.

    Your problem has nothing to do with configuration data or shared memory or any crap like that. Have you benchmarked where your bottlenecks are? It doesn't sound like you have.

    The good news is that I don't have to - your bottlenecks are in the SQL reads and writes. I will bet that you don't have good indices, that your loads are updating indices every time, and that you can increase your throughput 100-fold if you had a DBA consultant with 10years experience come in for 2 weeks and audit your system.

    A few items for you to look at:

    • Are your tables normalized?
    • Are you using the RDBMS's data extraction and loading features? They will be up to a thousand times faster than anything written in Perl.
    • Do you have indices on the tables you're reading? Have the tables you're reading been optimized?
    • Do you have indicies on the tables you're writing to? If you disable them while writing, you can increase throughput 10-100x.
    • With your multiple processes, are you hitting mutexes that are negating all benefits of SMP? For example, many RDBMSes will not let two processes update the same table at once, especially if the rows are in the same page of memory. Likewise, some will not let you insert into the same table at once, but some will. And, even crazier, some are configurable.
    • Are you taking advantage of all the RDBMS features? For example, MySQL will let you insert up to 1000 records at once. Oracle won't, but has other features.

    The overarching theme is Know your tools. It doesn't sound like you really understand them.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      Your problem has nothing to do with configuration data or shared memory or any crap like that. Have you benchmarked where your bottlenecks are? It doesn't sound like you have.

      Yes, I've profiled the code. The bottlenecks are in the I/O. No surprise there. The sql queries are right at the top, even before the file I/O.

      The good news is that I don't have to - your bottlenecks are in the SQL reads and writes.

      Actually, they're exclusively in the SQL reads. We don't have any SQL writes. If we did, I wouldn't have specified that all the configuration data was read only, now would I?

      I will bet that you don't have good indicies, that your loads are updating indices every time, and that you can increase your throughput 100-fold if you had a DBA consultant with 10years experience come in for 2 weeks and audit your system.

      Well, we have indexes on all our key fields, and we don't do any writes. We don't load data into tables, so it's unlikely that the nonexistant loads are updating the table indexes. It might be possible that a DBA with 10 years experience might give 100 fold improvement to our database, but I confess that I don't see how. I have neither the authority nor budget to hire a DBA, so it's a moot point.

      I can get a 100 fold improvement by moving code out of the SQL tables, though. Profiling lookups on one of our table using the existing code gives 670 lookups/second using the (poorly written, IMHO) database solution, versus 56,818/s for a direct perl hash lookup. This is a 100 fold improvement: and it's one that I can implement.

      The overarching theme is Know your tools. It doesn't sound like you really understand them. Does Informix have a secret "go fast" switch I can flip? Failing that, I'm not sure what else to try (but I'm certainly open to specific ideas.) That's why I went with my own system, having learned in school that relational databases are good for many reasons, but none of them are maximum speed.

      The simplest approach is just to throw everything into memory in a perl hash-like structure. This works, but load time becomes a factor, and we hit the per-process memory limit. Changing this would require a recompile of the HP/UX kernel, which doesn't sound like an option.

      So,I need to balance total memory use with lookup speed. Perl hashes on my system seem to expand at somewhere like 40:1 memory usage: I've seen a 7MB flat text file turn into nearly 100MB perl hash. Using some smarter data management, I can get that down to a 5MB binary file including indexes. Read from it using pure perl is still ten times faster than the SQL solution, and that's before I start to play with XS.

      This solution seems do-able. Shared memory may not be the way to go, but do you see any other solutions? -- Ytrew

        I was about to recommend looking at Cache::SharedMemoryCache. And then I read it. It says to use a filesystem cache because the speed difference is trivial. Which tells me that simply bumping up your buffers for your database may solve your speed problems. Instead of throwing memory at a shared memory cache, throw that memory at the database, and see if that helps.

        Profiling lookups on one of our table using the existing code gives 670 lookups/second using the (poorly written, IMHO) database solution, versus 56,818/s for a direct perl hash lookup. This is a 100 fold improvement: and it's one that I can implement.

        Well, we have indexes on all our key fields, and we don't do any writes.

        Ok - you have indices, but are you using them? What is the execution plan for your queries? Why are you pulling all your data out instead of looking at it when you need it?

        There are sooo many things involved in optimizing this kind of thing that you really need an expert on the ground, so to speak, to really give you more than just very generic places to look.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      good reply; listen to the man.
      normalized tables and indexes are key!
      properly implemented indexes took a script of mine from a 8 minute run-time to a 1-2 minute runtime. Long story short, they can help a lot!

      Justin
Re: How best to optimize shared data
by Frantz (Monk) on Feb 04, 2005 at 15:06 UTC
    I have used IPC::Shareable with success to share data between client/server components.

    The hard part with this module is to synchronize all the process.

    But if you want only read, the data never change, and there is no problem of synchronisation.

Re: How best to optimize shared data
by perrin (Chancellor) on Feb 04, 2005 at 19:40 UTC
    Don't try to write your own shared memory module from scratch -- there's no good reason to do it. Just use Cache::FastMmap or BerkeleyDB (not DB_File!) and they will handle the hard stuff and give you great performance.
Re: How best to optimize shared data
by eclark (Scribe) on Feb 04, 2005 at 23:19 UTC

    Have your script read all the configuration data into a perl data structure and then fork() into seperate processes. This way you let your OS deal with it. I dont know anything about HP-UX, but in Linux those memory pages containing your read-only data will remain shared across all processes until one changes, then it will get a private copy of the memory page that it changed. If you're clever, you can have the child process monitor its memory usage, kill itself and have the parent fork() another when too much memory is no longer shared.

    I have done this with Apache and mod_perl. Simply populate some globals through the apache config file.