Jaap has asked for the wisdom of the Perl Monks concerning the following question:

When variables are declared with my (under strict), they are 'reset' every time my mod_perl registry script is run. How then do i save my datastructure that took 10 seconds to build so that on the next run it doesn't have to rebuild it?

Also, my host has setup his apache2.conf so that i can run registry scripts, but not the handler modules. Is that a major limitation?

Replies are listed 'Best First'.
Re: [mod_perl] when does it keep data in memory?
by stvn (Monsignor) on Nov 27, 2004 at 13:40 UTC

    Well I don't know the details of your set up, but this is how I have shared large datastructures across mod_perl/Apache processes in the past.

    First, you need to have a place for the data-structure, as someone already mentioned, a package global is the way to go. Its really just this simple:

    package My::Big::Data::Structure; our $BigDataStructure = ...
    Then assuming that you use My::Big::Data::Structure at the top of your registry script, you should be able to access your global like this;
    # make a local reference # to it to save typing my $big_data_struct = $My::Big::Data::Structure::BigDataStructure;

    Next, you need to as your host to allow you to have a startup.pl file. This is a file which Apache will load upon server start-up. The Apache conf should have this in it

    PerlRequire "path/to/startup.pl"
    It is really just a perl file, nothing much special. And in it you can initialize your big-data-structure in it like this
    use My::Big::Data::Structure; $My::Big::Data::Structure::BigDataStructure = build_big_data_strucure( +);
    Of course the better way might be to encapsulate the big data structure into the My::Big::Data::Structure module, but that is another question.

    You do need to be aware of a few things though. First, the memory this data-structure occupies will be "shared" between all the apache child processes. Which means as long as you don't change the data-structure (and I am assuming you aren't) there will only ever need to be a single copy of this data. If however, you change the data-structure, know that the underlying OS will perform a copy-on-write and then that Apache child process, and ONLY that Apache child process will get a new copy of the data with the change applied. It is important to note that no other Apache child process will see the change in the data-structure nor will the parent process from which it was copied. Basically, you don't want to change that data-structure or you're in for a world of trouble.

    You also should know that your Apache server will take an extra 10 seconds to startup, since your big-data-structure is not getting initialized at server startup. Be sure to tell your host about this, or they may think something is wrong next time they go to restart your server and it seems to hang.

    -stvn
      stvn++, thanks for the nice explanation.
      I assume from your story that there is no (in-memory) way to make the child's changes visible to the other processes?
        I assume from your story that there is no (in-memory) way to make the child's changes visible to the other processes?

        You can try using some kind of shared memory caching, there are a number of CPAN modules for this, but I think it only works on a few OSes. When you are programming mod_perl like this you are basically programming a multi-process application, so you have all the usually Interprocess communication (IPC) issues that go along with it.

        To be honest, probably the best thing would be to try and fit this all into a database though (if that was possible) since that will have the best concurrency and data integrity checks. But since I don't know the format and content of your data-structure I cannot say if that makes sense.

        -stvn
Re: [mod_perl] when does it keep data in memory?
by Juerd (Abbot) on Nov 27, 2004 at 12:47 UTC

    In general, when you absolutely do not want it to :)

    But less generally, package globals. As all subsequent requests are all handled by the same eternal Perl interpreter, and nothing resets globals automatically, they're still around if you didn't reset them yourself.

    For something that takes 10 seconds, using a global isn't a great idea. Every Apache process will need to have its own copy, and will thus build it once. That means the global is only useful if you get more than a few visitors. But if you get more than a few visitors, you never want any normal page request to last 10 seconds.

    My advice is to build the thing externally and then let the request handler use the static information.

    And whether not being able to use other handlers is a limitation depends only on what you want. If you want to use other handlers, it is certainly a limitation. If not, then I don't see how it could be limiting.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: [mod_perl] when does it keep data in memory?
by Arunbear (Prior) on Nov 27, 2004 at 13:36 UTC
    You can use Cache::Cache to save your data structure (this is what Mason uses for caching).

    Your host seems to be running Apache 2. mod_perl 2 (which is required if you are using Apache 2) is still (mostly) in development and AFAIK most of the handler modules on CPAN are targeted at mod_perl 1.x, so the fact that you are limited to registry scripts is probably not something to worry about.

Re: [mod_perl] when does it keep data in memory?
by matija (Priest) on Nov 27, 2004 at 11:53 UTC
    When variables are declared with my (under strict), they are 'reset' every time my mod_perl registry script is run. How then do i save my datastructure that took 10 seconds to build so that on the next run it doesn't have to rebuild it?

    That's not your only problem. Even if you manage to keep the data in memory, you can't gurantee (in fact, on a busy server the chances are pretty slim) that the next request will hit the same server. So you'll have the data in memory, but in another process, as out of reach as if you had it on the other side of the moon.

    Perhaps you'd be better of saving it in a database, a DB_File, or even a text file...

Re: [mod_perl] when does it keep data in memory?
by johnnywang (Priest) on Nov 27, 2004 at 23:56 UTC
    I'm in the same situation where some of my data needs to be shared among all apache processes, and the data can change. For that I use IPC::Shareable to put the data in shared memory segment. This has been running fine for me, but I don't like the fact that I have no control on the number of shared segments and the semaphores they use. I'm moving towards database cache. Since IPC::Shareable internally uses Storable, which isn't that fast despite the fact that it's in memory, for my purpose mysql gives about the same performance.
Re: [mod_perl] when does it keep data in memory?
by BrowserUk (Patriarch) on Nov 27, 2004 at 11:22 UTC

    I know naff all about mod_perl, but if my vars are reset, and you don't want that to happen, couldn't you put the datastructure (or a reference to it) into a global?


    Examine what is said, not who speaks.
    "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
    "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      global as in our or not declared at all? In other words: what is a global? (i feel so n00bish)

        A global is a global (except for a few special ones). Whether you keep strict quiet by using use vars [$theGlobal]; or our $theGlobal; shouldn't make any difference, you'll be using the same piece of RAM.

        That is, that's how it works in a non-mod_perl script. Again, I re-emphasis my earlier disclaimer here. I know nothing of mod_perl, nor how this would play with Apache2 (which uses threads?).


        Examine what is said, not who speaks.
        "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
        "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon