http://qs1969.pair.com?node_id=60252

Item Description: Persistency for Perl data structures

Review Synopsis:

Storable is one of those modules that I use so much, I forget it's there. My day job involves an overly ambitious application builder. The designer (one of my co-workers or a customer of ours) writes a text definition of an application and runs it through our compiler (using Parse::RecDescent, which I'd review also if it weren't being replaced), which builds the Perl object representation of the application and stores it in a repository via DB_File.

When I first started working on the compiler, I wrote my own code to store and reconstitute objects in the repository. As it got more complex (and slow) I started to think this had to be a problem someone else had already solved. I went looking for help and discovered Storable (and CPAN along the way -- I was just a wee slip of a Perl coder then).

Storable makes this kind of thing trivial. If you have coded your own solution as I was, don't be surprised if big stretches of perl vanish into a few imported function calls. Here's all the code you need to turn an object into a scalar:

use Storable qw(freeze thaw); ... $buffer = freeze($obj);
The $buffer scalar now contains a very compact representation of the object -- whether it was an array reference, a blessed hash or whatever. Drop that string into your favorite file, tied DBM database or SQL blob and you're done.

Retrieve that same scalar in some other stretch of code (or another program, as long as it has loaded all the necessary modules) and you can have your object back just as easily: $newInstance = thaw($buffer); If the frozen buffer was a blessed reference, then so is the new instance, but not the same reference; Storable can be used to clone complex objects and structures this way, and even has convenience functions for that. (But you might want to look at Clone instead.

Storable's pod suggests that objects can inherit from it and use freeze and thaw as methods. I don't do that; instead I store and retrieve objects from the aforementioned tied DB_File database like so:

sub store { my $obj = shift; my $key = $obj->key; $db{$key} = freeze($obj->freeze); return $key; } sub fetchFromDb { my ($key, $noWake) = @_; if (my $buf = $db{$key}) { my $obj = thaw($buf); return $noWake || !$obj ? $obj : $obj->wake; } return undef; }
(Code that checks if the database was opened for write and so on was omitted for cleaner lines and that sexy soft-spoken style.)

The two functions are in a module that hides the details of the database from the rest of the program. The store function in effect becomes a filter that transforms an object into its retrieval key. If the object has attributes that shouldn't be stored (run-time only information, say) then it's special-built freeze method gets rid of it and returns $self. The fetch function can be used to retrieve the object in its frozen state, or (normally) will invoke a wake method to let the instance rebuild any run-time state it needs before it faces the world.

Okay, this is rapidly turning into a review of how I use Storable instead of what the module does, so back to the feature list.

Storable's documentation emphasizes the number of ways it will write and retrieve objects from files and other IO entities. If you use a file for each object (and remember that an "object" can be a simple hash or array too, no blessings required) then Storable will do all the work including opening and closing the files for you:

store \%table, 'file'; $hashref = retrieve('file');
To borrow more examples from the pod, you can use opened file handles too:
store_fd \@array, \*STDOUT; nstore_fd \%table, \*STDOUT; $aryref = fd_retrieve(\*SOCKET); $hashref = fd_retrieve(\*SOCKET);
The "n" versions of store and store_fd use network byte ordering for binary values, making it reasonably safe to store and retrieve objects across architectures. The retrieval examples show fetching objects from an open socket -- Perl-based object servers, anyone?

While feature-rich, Storable remains fast, much faster than my original code. It is implemented in C with a close eye on Perl internals to work swiftly and efficiently.

Storable has added quite a few features since I started using it; for example, you can now add your own hooks to the freeze and thaw code to implement what I did above at a lower level. In those hooks you can use special class methods to find out more about what Storable is doing and decide how your hook should act.

Since CPAN now (optionally) uses Storable to store metadata, many Perl admins are aware of it, but might not be putting it to use in their own code. Consider this module any time you find yourself writing a loop to store a hash or array to a file. Storable "scales up" to more complex structures seamlessly, so you can use your favorite tricks without worrying about how you're going to write and retrieve it later.

Replies are listed 'Best First'.
Re (tilly) 1: Storable
by tilly (Archbishop) on Feb 22, 2001 at 21:43 UTC
    One useful trick is the dclone method, freeze and thaw a structure, resulting a deep copy. I have found this useful, but there is one gotcha. Currently I do not know of any package (Storable and Data::Dumper included) that can handle anonymous functions.

    Therefore if you want to make a deep copy of a structure that includes anonymous subs, you will still need to roll your own. (Not that it is very hard.)

Re: Storable
by Tyke (Pilgrim) on Feb 23, 2001 at 16:52 UTC
    A trick that I've found useful in TK apps is the following:
    use vars qw/$CONFIG $DATA/;
    
    BEGIN {
      $CONFIG = "$0.conf";
      $DATA = retrieve($CONFIG) if -e $CONFIG;
    }
    END {
      store($DATA, $CONFIG);
    }
    
    $DATA is a reference to a hash that contains the user entered data. This little addition to the program means the data entered by the user in the last session becomes the default in this session... saves a heck of a lot of typing - particularly during testing!
      An excellent technique which will find its way to my code very soon :)

      My variation was to have a human-readable config file, but keep the actual data in a hash written out by Storable. At startup the program checked the dates of the two files and re-parsed the config file if needed. I hadn't thought of using it to maintain user state though, nice job.

        ...or do both in one shot with Data::Dumper. Doing this prevented me from having to define a human-readable configuration-file syntax. Though I do process the Data::Dumper output slightly.

                - tye (but my friends call me "Tye")
What's wrong with Storable
by PetaMem (Priest) on Apr 16, 2004 at 18:15 UTC
    Hi,

    I can only wonder about the memory and CPU requirements...
    Saving a bigger hash (ca. 515000 entries, where each key is about 7-10 chars long and each value is a Tree::Nary structure), results in a ca. 330MB file. Uncompressed that is.

    Restoring it, takes an average of 3,8 times more memory than disk space => ca. 1,2GB!

    Saving is weird, after all the 330MB have been saved, store takes another 2200 seconds of 100% CPU time on a P3 1,26GHz even after the complete file is already on disk.

    Moreover, store seems to take when storing the same amount of memory as the data structure takes. in the aforementioned case over 2,1GB RAM are allocated! Gosh...

    Bye
     PetaMem
        All Perl:   MT, NLP, NLU

      Restoring it, takes an average of 3,8 times more memory than disk space => ca. 1,2GB!
      This is not surprising --- Storable may store the data more space-efficient than it is in the RAM --- think of malloc overhead, necessary struct alignment, and the size of a SV*.

      Saving is weird, after all the 330MB have been saved, store takes another 2200 seconds of 100% CPU time on a P3 1,26GHz even after the complete file is already on disk.
      Maybe it's the global destruction which is causing the slowness. This may be due to an inefficient malloc. For example, it was reported that on FreeBSD, perl with the system malloc is very slow on deleting a large hash. Similar is true for certain Linux' malloc versions.
        Actually it is what I would call a design flaw of the Storable module, this was discussed (and resolved) in detail at the perl5 porters list. Expect a patch to storable RSN.

        Bye
         PetaMem
            All Perl:   MT, NLP, NLU