in reply to Re^2: Storing large data structures on disk
in thread Storing large data structures on disk

Anyway, I can't get it to run

As others have explained, switch our $O //= 2 to our $O ||= 2 for pre-5.10 perls.

I have to admit this code is too complex for me - too many shortcuts

If you have specific questions about particular lines of code, just ask. That is what this place is for.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^4: Storing large data structures on disk
by roibrodo (Sexton) on May 31, 2010 at 19:37 UTC

    First of all, thanks again. I really appreciate the help from all you guys. I also appreciate how much more there is to know about this wonderful language.

    A few question re. BrowserUk's code:

    1. When I run your code with passing -O=6, for example, it also prints the structure to the screen. This does not happen when omitting the -O=... . Why is that?

    2. What is the meaning of pp here? I read in CPAN that it is used to create standalone executables, but I don't understand the connection (and moreover, why do we pass the structure to it...).

    3. Can you explain the heart of the packing:

     printf O "%s", pack 'V/A*', pack 'V*', @{ $AoA[ $_ ] };;

    we we print each array to the output file. what does the / between the V and A stand for? I can read it means for a count of the packed items, but where does it value come from? and why do we need the second pack?

    And one last question for now - when the ds becomes too large to store it all in memory, is tying with MLDBM the preferred paradigm? What are the alternatives?

    Thank you!

      1. 1. When I run your code with passing -O=6, for example, it also prints the structure to the screen.

        It should only dump the structure to the screen if -O=2 or less? See the lines that end in if $O <= 2;. There is something wrong with your copy of the code if this is not the case?

        I added that so that I could quickly check that what got unpacked was the same as what was packed. For small examples only.

      2. What is the meaning of pp here?

        If you look a the third line of code you'll see: use Data::Dump qw[ pp ];; pp in this case stands for "pretty print" and is Data::Dump's equivalent of Data::Dumper's Dumper() function.

      3. Can you explain the heart of the packing:  printf O "%s", pack 'V/A*', pack 'V*', @{ $AoA[ $_ ] };;

        Okay. First off update your copy of the code from the original node where I've switched it from printf to print.

        The guts of the thing is two calls to pack.

        • pack 'V*', @{ $AoA[ $_ ] };

          It goes through the array: @AoA (with $_ set to 0 .. $#AoA) one element at a time getting the reference to the sub-array.

          The @{ ... } bit expands the array reference to the contents of that sub-array.

          The pack format "V*" say pack all the values in the list (produced above), as unsigned integers into a binary string and return that string.

        • pack 'V/A*', ...

          The second pack template "V/A*", says return the input binary string ("A*") prefixed ('/') with a 32-bit unsigned integer ('V').

          And the print writes that out to the file.

        As your sub-arrays are variable sized, we need the prefix count so that we know how much of the file to read back into each sub-array when retrieving it.

        Note: You might prefer to use 'N' rather than 'V' if that is more natural on your platform.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thanks again. One last question for now (I guess I added it to the previous post while you were replying) : when the ds becomes too large to store it all in memory, is tying with MLDBM the preferred paradigm? What are the alternatives?