in reply to Storing large data structures on disk

This takes less than 3 minutes each to store and then retrieve 3.2 GB of AoA to disk (1e6 arrays of ave. 100 integers):

Update: If I use -O=5.8 which translates to somewhat over 630,000 arrays of ave:100 elements, but avoids pushing my machine into swapping, the time is 10 seconds to write and 11 to read.

#! perl -sw use strict; use Data::Dump qw[ pp ]; use Time::HiRes qw[ time ]; $|++; our $O //= 3; my @AoA; $#AoA = 10 ** $O; $AoA[ $_ ] = [ 1 .. 1+rand 200 ] for 0 .. $#AoA; pp \@AoA if $O <= 2; my $start = time; open O, '>:raw', 'junk26.bin' or die $1; for ( 0 .. $#AoA ) { printf O pack 'V/A*', pack 'V*', @{ $AoA[ $_ ] }; ## switched prin +tf to print } close O; printf "Store took %.6f secs\n", time() - $start; @AoA = (); $start = time; open I, '<:raw', 'junk26.bin' or die $!; for ( 0 .. 10 ** $O ) { read( I, my $n, 4 ); read( I, my $buf, unpack 'V', $n ); $AoA[ $_ ] = [ unpack 'V*', $buf ]; } close I; printf "Retrieve took %.6f secs\n", time() - $start; pp \@AoA if $O <= 2; __END__ C:\test>junk26 -O=6 Store took 169.778000 secs Retrieve took 170.926000 secs

Resultant file is 400 MB on disk and gzips to 6 MB.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: Storing large data structures on disk
by roibrodo (Sexton) on May 31, 2010 at 17:22 UTC

    I have to admit this code is too complex for me - too many shortcuts I'm unfamiliar with, but I'm doing my best to understand it :) (I guess I should move to the newbies section...)

    Anyway, I can't get it to run - I get:

    Bareword found where operator expected at test2.pl line 18, near "prin +tf O "%s", pack 'V/A" (Might be a runaway multi-line // string starting on line 8) (Do you need to predeclare printf?) Bareword found where operator expected at test2.pl line 18, near "', p +ack 'V" (Missing operator before V?) Global symbol "@AoA" requires explicit package name at test2.pl line 8 +. Global symbol "@AoA" requires explicit package name at test2.pl line 8 +. Global symbol "@AoA" requires explicit package name at test2.pl line 8 +. Global symbol "@AoA" requires explicit package name at test2.pl line 8 +. Global symbol "$start" requires explicit package name at test2.pl line + 8. Global symbol "@AoA" requires explicit package name at test2.pl line 8 +. syntax error at test2.pl line 18, near "printf O "%s", pack 'V/A" Bad name after raw' at test2.pl line 26.
      Anyway, I can't get it to run

      As others have explained, switch our $O //= 2 to our $O ||= 2 for pre-5.10 perls.

      I have to admit this code is too complex for me - too many shortcuts

      If you have specific questions about particular lines of code, just ask. That is what this place is for.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        First of all, thanks again. I really appreciate the help from all you guys. I also appreciate how much more there is to know about this wonderful language.

        A few question re. BrowserUk's code:

        1. When I run your code with passing -O=6, for example, it also prints the structure to the screen. This does not happen when omitting the -O=... . Why is that?

        2. What is the meaning of pp here? I read in CPAN that it is used to create standalone executables, but I don't understand the connection (and moreover, why do we pass the structure to it...).

        3. Can you explain the heart of the packing:

         printf O "%s", pack 'V/A*', pack 'V*', @{ $AoA[ $_ ] };;

        we we print each array to the output file. what does the / between the V and A stand for? I can read it means for a count of the packed items, but where does it value come from? and why do we need the second pack?

        And one last question for now - when the ds becomes too large to store it all in memory, is tying with MLDBM the preferred paradigm? What are the alternatives?

        Thank you!

      Did you use the download code link? You probably need perl 5.10 because of defined-or operator (//=).