dshahin has asked for the wisdom of the Perl Monks concerning the following question:

If I were to use Storable or data dumper to splurt out the contents of an object/datastructure, how can I check the validity of the stored objects in a simple and reliable way. I thought about a checksum, but I'm pretty sure that's not a cross platform solution, because of CR/LF issues that would cause the checksums to differ. Storable dumps a binary file, but how can I be sure the objects are the same given the same data set. Won't the unordered way hashes are internally represented potentially differ from NT to UNIX, or even from machine to machine?

Replies are listed 'Best First'.
Re: verifying serialized objects
by saucepan (Scribe) on Dec 22, 2000 at 08:54 UTC
    What kinds of problems are you worried about detecting? I/O errors? Data::Dumper doesn't seem to do any I/O, so you should be fine as long as you are careful to do the error checking yourself when writing to disk (remember that both close and print have return values; see Fatal.pm for a way to reduce the workload a bit).

    Storable's man page promises that it checks for I/O errors. I just checked this by creating a small RAM disk and running the following script:

    #!/usr/bin/perl use Storable; my @huge = ('aaaaa' .. 'azaaa'); my $result = undef; eval { $result = store \@huge, '/mnt/mfs/bigfile' }; $result = '(undef)' unless defined $result; print "Result:$result exception:$@\n";
    Sure enough, this displayed "Result:1" unless the RAM disk filled up, in which cases it displayed "Result:(undef)". (As an aside, the man page says store will die on "serious" errors, but I wasn't able to induce this even by forcibly unmounting the filesystem in mid-write... whatever these serious errors are, I hope I never see one! :)

    If you are really paranoid (or running NFS with UDP checksums disabled), you could always try thawing your data back into a second variable and then either doing your own deep comparison or letting Storable do it for you (look for the $Storable::canonical option in the Storable man page).

      I'm not really worried about the validity of the data, so much as I want an easy way to compare to objects without having to load them into memory. That is, I just want to look at a checksum of the object, and only load the object if the checksum says it has changed. That way I can avoid strain on the network and memory, unless it is merited.
Re: verifying serialized objects
by Dominus (Parson) on Dec 22, 2000 at 09:04 UTC
    Says dshahin:
    > Won't the unordered way hashes are internally represented
    > potentially differ from NT to UNIX, or even from machine to machine?
    No.

    It might vary from version of Perl to version of Perl, and in fact it did change between 5.005 and 5.6. But it will not vary from machine to machine or OS to OS.

Re: verifying serialized objects
by Fastolfe (Vicar) on Dec 22, 2000 at 08:28 UTC
    As the data is binary, you shouldn't have to worry one iota about newlines, because you'll be treating it as binary data (use binmode on Win32). Don't try to read the file using <FH>; use sysread. In addition, this bit from the Storable documentation may be useful to you:
    You can also store data in network order to allow easy sharing across multiple platforms, or when storing on a socket known to be remotely connected. The routines to call have an initial n prefix for network, as in nstore and nstore_fd. At retrieval time, your data will be correctly restored so you don't have to know whether you're restoring from native or network ordered data. Double values are stored stringified to ensure portability as well, at the slight risk of loosing some precision in the last decimals.