Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
As my ever expanding circle of knowledge increases, I begin to examine things that I took for granted previously. Perl is a wonderful language that worries about a lot of things for you. My current quest is to assimilate memory saving techniques.

It may seem counter intuitive at first, but a great methodology I like to use to learn new things is throw out my tool box and solve an old problem. Plato said "Necessity is the mother of invention". In other words, we tend not to explore when what we have works. This is only a catalyst process. Unless the new tool is actually superior, I use the old tool to solve the problem when it comes up. The new tool usually fits better somewhere else yet undiscovered.

In this particular case, I am trying to conserve as much space as possible in what would otherwise be a hash of arrays (HoA). Since I know inadvance exactly how many bytes total each array needs to be, I was hoping to use the equivalent of a C struct with pack/unpack resulting in a single hash. The trouble is that I need to include a reference and have it still work when it is unpacked.

As you can see, it is unlikely that this solution is a good fit to this problem. This doesn't mean it isn't worth pursuing since it is only an intermediate process. In fact, I doubt any process that converts a stringified reference back into a reference is very safe but the hunt for knowledge must go on. Does anyone know of a canned solution for this? In the chatterbox, tye indicated he believed there was an XS module but I couldn't find it. japhy is off working on one because it intrigued him as well.

While the problem outlined is my main focus at the moment, any memory saving/controlling techniques would be appreciated.

Cheers - L~R

Replies are listed 'Best First'.
Re: Techniques On Saving Memory
by BrowserUk (Patriarch) on Mar 09, 2005 at 18:55 UTC

    Take a look at Devel::Pointer. It has routines for unsmashing smashed references. It works, and I have used it to do something very similar to what you are attempting.

    One problem is that you also need to store the type of ref (scalar/array/hash/code) along with the smashed address. That screws the otherwise convenient 32-bit / element storage requirement making the math messier than it ought to be.

    The second is you also need to store unsmashed copies of the references so that perl will not GC the elements to which you are storing smashed refs.

    That duplication of storage negates much of the benefit, combined with the fact that you are now responsible for managing the memory instead of allowing Perl to take care of it, makes for a heavier and riskier module than might otherwise be the case.

    I have the germs of an idea for dealing with that also, but the tools are not available to me to implement it (yet).


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      you also need to store unsmashed copies of the references so that perl will not GC the elements to which you are storing smashed refs

      Ooo, ouch. Good point.

      I think you will find, case after case, that by the time you're done "improving" a perl feature, you're going to have a solution that is as complex (in memory and time and code and shortcomings) as the perl implementation itself.

      --
      [ e d @ h a l l e y . c c ]

        I've been trying to get this idea right for almost two years now. On two occasions I've even reached the point where I thought I was ready to let code loose on the world, but then I discovered something which I wasn't able to handle without duplicating the references in a hash, which completely negates any benefit at all.

        As I said, I have the germ of a solution, but as I am entirely uncomfortable in using XS, or even XS modules in ways that move beyond their authors intent, it will remain just an idea until I can think of a way of proving it's safety.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
      BrowserUk,
      The second is you also need to store unsmashed copies of the references so that perl will not GC the elements to which you are storing smashed refs. That duplication of storage negates much of the benefit,...

      tye was quick to point out an easy solution for the problem was to use a parallel hash as a lookup table. I admitted it was a good straight forward answer but defeated my purpose. On the other hand if an anonymous sub can be made to take up less space than the anonymous array:

      sub compact { my $ref = shift; my $pck; # Packed information except reference # purposefully avoiding dispatch table # This keeps us at HoC return sub { if ( $_[0] eq 'data' ) { return $ref }; elsif ( $_[0] eq 'blah' ) { return unpack('A1', $pck) } # where each unpack format gets desired data }; } $hash{foo} = compact( [1..10] ); $hash{foo}->( 'data' )[3] = 42;

      The interface could use work and it doesn't really explore any uncharted territory for me. It also is still not a single hash. We are just trading anonymous subs for anonymous arrays.

      Cheers - L~R

      My module (now v0.02!) doesn't seem to suffer from the garbage collection problem, and it doesn't require you tell it what kind of reference you expect back. It's not pure Perl (unlike Devel::Pointer::PP), but it is easier to use.

      (Update: I misinterpreted the garbage collection issue.)

      _____________________________________________________
      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      BrowserUk,
      Thinking out loud here with only the most elementary understanding of Perl internals - why can't you fake a closure? Basically increase the ref count without actually making the closure. You GC it by decreasing it. As long as you control the creation/modification/deletion of the hash keys and their values, I don't see an obvious problem?

      Cheers - L~R

        ... why can't you fake a closure?...

        Sorry, but I don't even begin to see how to fake a closure? And how would you unfake it?

        As long as you control the creation/modification/deletion of the hash keys and their values, ...

        What hash? Why use a hash and how would that save memory if you do the Tie::RefHash thing of storing the smashed and unsmashed references?

        If you do the RefHash thing, you don't need to do any controlling--once you delete the (value) reference, Perl takes care of the rest. But trading a hash for an array is not going to save memory.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
Re: Techniques On Saving Memory
by japhy (Canon) on Mar 09, 2005 at 18:45 UTC
    Voila! This was written in about 30 minutes (well, 5 minutes of re-reading the XS docs, 10 minutes of actual coding time, and 15 minutes of writing docs and tests and what-not). (Update: another 10 minutes produced version 0.02, which makes one function do the job of two.) Limbic~Region was asking about such a module on the CB, and when I described what I thought such a module should do, I decided to try writing it.

    Devel::RecoverRef is a module which takes a stringified or numified version of a reference (objects included!) and returns the reference (or object!) that is represented therein. There's not a lot of protection, though. You'll probably dump core if you make stuff up. Scrutinize, and I'll upload it to CPAN by the end of the week.

    _____________________________________________________
    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      Is there some reason you didn't just write this as a wrapper over Devel::Pointer or Devel::Pointer::PP? They already handle the unsmashing part - you just add the part to get the hex address out of strings like "Some::Class=HASH(0x81471c8)". Or why didn't you just ask either Simon or me to update our modules? Its probably a feature that should already be in there.
        I hadn't known of those modules until after I wrote mine. They were mentioned in other nodes in this thread. Now that I know they exist, there's no need for me to release mine; you and Simon just need to simplify your modules.
        _____________________________________________________
        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Techniques On Saving Memory
by bart (Canon) on Mar 09, 2005 at 19:16 UTC
    I'm not sure what you mean by "reference", but pack/unpack do contain support for two kinds of templates, "P" and "p", that can be used to reference/dereference some block of memory. So if what you want is a (zero-terminated) string or a packed, fixed length structure, you can use one of these. You must be really careful not to move your strings containing the packed data around, or you'll end up dereferencing the wrong memory block. So no changing the length of the scalar holding that block, once you've started packing.

    I thought of using substr to point inside a string, but it doesn't work; but the following gives some form of workaround:

    $x = "one\0two\0"; $b = pack "p", $x; # pointer to "one" $b .= pack 'I', 4 + unpack 'I', $b; # pointer to "two" = substr($x, 4) printf "length: %i\n", length $b; printf "%i %i\n", unpack "II", $b; printf "%s\n", $_ for unpack "pp", $b;
    Result:
    length: 8
    2279940 2279944
    one
    two
    
    Like I said, $x contains the packed data, so don't move that around.

    For unpacking a fixed length string, you can use "P12" for a 12 byte string, for example. Appending the following snippet to the above code, I get the following additional result:

    printf "%s\n", $_ for unpack "P2P2", $b;
    on
    tw
    

    Note: the unpacked data is a copy of the data, not a reference into the same, original data.

Re: Techniques On Saving Memory
by Roy Johnson (Monsignor) on Mar 09, 2005 at 19:17 UTC
    Looking at the problem differently, since you know how big each array is, you can use a hash and one array. In the hash, you store the offset to the beginning of the desired (conceptual) row in the array.
    use strict; use warnings; my %hash; my @array; my $row_length = 5; my $next_available = 0; for (qw(cow pig donkey)) { $hash{$_} = $next_available; for my $col (0..($row_length - 1)) { $array[$next_available + $col] = "$_ $col"; } $next_available += $row_length; } # What would have been $hash{pig}[3]: print $array[$hash{pig}+3];
    You could even make a class implementing hashes of fixed-size arrays and give it an indexing method. I'm not going to. :-)

    Caution: Contents may have been coded under pressure.
Re: Techniques On Saving Memory
by diotalevi (Canon) on Mar 09, 2005 at 23:44 UTC

    Most of the discussion by other people seems to be about how to represent the same data in a smaller space. Here is another idea - instead of storing stringifid references, store them as packed data: pack( 'N', $str =~ /x([A-F\d]+)/ ? hex( $1 ) : die ). Your string is now four bytes long instead of at least twice that. You'd unsmash it using some "normal" method. You could also compress your data. I've stored things in memory as six bit characters when I knew I only wanted to use a limited character set and wanted to get the 25% space savings. Similarly with numbers, packing them can yield a shorter binary value than they are when represented as strings.

    You shoul also consider whether your algorithm can be changed so that less data has to be in memory at any given time. This is a really good reason to check out Dominus new book Higher Order Perl to get some training on using streams and iterators. Iterators are useful tools to use when altering your code so that less stuff is actually present at any given moment.

    Offloading things to disk is another way to save memory. Use BerkeleyDB to get a good disk-bound hash.