antiquark has asked for the wisdom of the Perl Monks concerning the following question:

In my production environment we have had what we believe to be a corrupt storable. I am unable to replicate the behaviour in Dev, which has made it hard to diagnose exactly.

The code has been working for a long time, and the change that made it break was deleting from the hash. Up until recently, the hash either stayed the same size, or grew.

The file is opened in readwrite, and then store_fd writes to the file. As the hash is now (sometimes) smaller, it will write say 1000bytes to this 2000byte file. The tail 1000 bytes are old, garbage data. In my test cases, when I retrieve the hash, the garbage data is ignored, as expected.

open( $sf, "+< $self->{mod_state_filename}" ); flock( $sf, LOCK_EX ); $self->{mod_state} = fd_retrieve($sf); delete ($self->{mod_state}{"somekey"}); seek( $sf, 0, 0 ); store_fd( $self->{mod_state}, $sf ); flock( $sf, LOCK_UN ) close($sf);

My questions:

  1. Should this work, or is it imperative that I truncate the file?
  2. Does the stored hash use some kind of file terminator character? If so, what is it?
  3. The above code, deleting and adding and deleting and adding, works perfectly in my test case. Can you suggest any test case sequence that might cause it to fail, due to the non-truncated file? (I know this is a really vague question, so feel free to ignore it).

Thanks, Brock

Replies are listed 'Best First'.
Re: Storable.pm - corrupt when saving to non-truncated file
by ikegami (Patriarch) on Nov 24, 2010 at 03:42 UTC

    It seems to me it would fail often and reliably if truncate was required.

    It is mentioned that (n)store_fd/fd_retrieve can write to a socket, which implies (but does not guarantee) that the format doesn't require EOF to detect the end of the data. And if it it works for *_fd, I would be very surprised if it didn't work for the others.

    That's two reasons that make me think that truncating isn't required.

    PS - You really should binmode your handle.

Re: Storable.pm - corrupt when saving to non-truncated file
by roboticus (Chancellor) on Nov 24, 2010 at 12:37 UTC

    antiquark:

    Several things:

    • I've never use Storable before, but by reading the documentation, I'd expect that if the stuff at the end is the problem (and I don't think it is), then you could solve it by replacing your seek with closing the file and reopening it in write mode.
    • However, since it appears (also from the docs) that you can store and retrieve multiple objects from the file descriptor, having extra stuff at the end shouldn't be a problem, as the file format should be able to handle the end of an object with more stuff after it. You aren't trying to read multiple objects from the stream in any order you please, are you? If so, I'd expect extra seeks and reads to be problematic, as you could skip over any bookkeeping information Storable may use.
    • The docs also mention that the store functions have versions with an 'n' prefix to use network order. You're not using your stored data between machines are you? If so, you may want to use the 'n'-prefixed versions to ensure you're not being bitten by an endianness bug between CPUs.

    OK... now that I've said that, I've gotten curious. I just kicked out a quickie example:

    #!/usr/bin/perl use strict; use warnings; my %hsh = ( Apple=>1, Banana=>7, Cherry=>42 ); my $txt = "The quick red fox jumped over the lazy brown dog"; my @ary = (17, 'Flugelhorn', 13.333); use Storable qw( nstore_fd ); open my $SF, '>', 'tst_Storable.db' or die "Error: $!\n"; nstore_fd \%hsh, $SF; nstore_fd \$txt, $SF; nstore_fd \@ary, $SF; close $SF;

    That writes some variables to the test file, and now to read and display the data:

    #!/usr/bin/perl use strict; use warnings; my %hsh; my $txt; my @ary; use Storable qw( fd_retrieve ); open my $SF, '+<', 'tst_Storable.db' or die "Error: $!\n"; %hsh = (%{fd_retrieve $SF}); $txt = ${fd_retrieve $SF}; @ary = @{fd_retrieve $SF}; close $SF; print "Text: '$txt'\n"; print "Array: (", join(', ', @ary), ")\n"; print "Hash: (", join(', ', map { "$_=>$hsh{$_}" } keys %hsh), ")\n";

    So when I run it, I get the expected results:

    roboticus@Boink:~ $ perl tst_Storable_2.pl Text: 'The quick red fox jumped over the lazy brown dog' Array: (17, Flugelhorn, 13.333) Hash: (Cherry=>42, Banana=>7, Apple=>1)

    You can definitely store multiple objects with Storable, so once you read an object, it shouldn't matter that there's extra data. So I'm guessing that either you need to:

    • binmode your handle, as suggested by ikegami, or
    • add the 'n' prefix for inter-machine usage, or
    • stop seeking around in the file (if that's what you're doing).

    Personally, I think I'd just use "read" mode to read all the objects, and at the end of the program, open in "write" mode to store everything. The fact that you're using read/write leads me to suspect that you may be seeking around the file reading and writing objects and confusing Storable so it can't keep track of the state of the objects since its bookkeeping information isn't where it expects to find it.

    If that's what you're doing, then an ugly hack may be to write a sentinel value after (and possibly before) every value you may want to read randomly. That way, writing the sentinel may give Storable enough bookeeping information about the end of the object (if that's what it's lacking).

    ...roboticus

Re: Storable.pm - corrupt when saving to non-truncated file
by locked_user sundialsvc4 (Abbot) on Nov 24, 2010 at 15:59 UTC

    I used to like Storable.

    I don’t, anymore.

    What has been remarkably satisfactory for me is ... YAML.   (There are several flavors of it, including some modules that are “Pure Perl.”)   One very nice advantage of it is that ... it is readable by humans.   You can plainly see what it says.   If storage space happens to be a genuine issue, it is also easily compressible by Zip/deflate and the like...

    With Storable, I was fairly awash with “stored” things that I couldn’t actually retrieve.   When I started using YAML, the problems vanished and never returned.   (Of course, all of the things that I actually need to store, are things that YAML can represent.   Your Mileage May Vary.™)

      I used to like Storable. I don’t, anymore.
      Same story here.
      With Storable, I was fairly awash with “stored” things that I couldn’t actually retrieve.
      Same here. Migration between 32-bit and 64-bit architectures, or between little-endian and big-endian architectures was a nightmare with Storable. Yes, I know there nfreeze supposedly solved (some of) this, but not when one didn't use Storable this way right from the start.

      Now with JSON or YAML there are few good reasons to use Storable IMHO.

      --
      No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]
Re: Storable.pm - corrupt when saving to non-truncated file
by antiquark (Initiate) on Nov 24, 2010 at 22:34 UTC

    I also posted to the perl5-porters list, and got a reply saying I do need to truncate the file.

    I'm hoping to write a test-case today to get it to fail, and then implement the truncate to fix it.

    I'll comment again if I am successful.

    Were I writing it today, I probably wouldn't use Storable, but some kind of database, like SQLite.