John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I'm planning to have a Package variable that is a hash, say %BigHash, that I will create only if it is needed. Perhaps I'll read it in from another file.

A hash can't be undef like a scalar can. What's an efficient and elegant way to test to see if I've loaded it already? If I used a ref instead of a hash variable, I could use undef as opposed to a hash ref. But I plan to make the thing available as a public variable too, and there's really no need for it to be a reference other than this feature.

If it's not defined at all, I should be able to check for exists in the Stash, right? Is that efficient, like checking the keys of a regular hash? I'm just curious at this point, since I suppose that even so, it's only as efficient as a hash exists, while normal named variables are even more efficient.

So my current plan is to use a separate lexical variable to keep track of whether it's been loaded, since that is fastest to check. But it bugs me to use a separate variable, and I still wonder if there is a better (or at least "cooler") way.

Replies are listed 'Best First'.
Re: lazy creation of a hash
by moritz (Cardinal) on May 12, 2011 at 12:14 UTC
    Do you actually need to distinguish the empty hash from an "undefined" state? If not, just use a scalar truth test, which returns only false if the hash is empty.
      Yes, in this case I can be sure that the populated hash is not empty.

      I suppose that should be fast/efficient if done in boolean context.

Re: lazy creation of a hash
by LanX (Saint) on May 12, 2011 at 14:31 UTC
    what's wrong with defined? OK deprecated...

    DB<102> p defined %h DB<103> %h=(a=>1) DB<104> p defined %h 1 DB<105> my %h=(a=>1); print defined %h 1

    Cheers Rolf

    UPDATE: Here what the docs say:

    Use of defined on aggregates (hashes and arrays) is deprecated. It used to report whether memory for that aggregate has ever been allocated. This behavior may disappear in future versions of Perl. You should instead use a simple test for size:

    1. if (@an_array) { print "has array elements\n" }
    2. if (%a_hash) { print "has hash members\n" }

    UPDATE:

    I think if you really need to distinguish between an empty hash and an non-existent hash, it can only be a package var. In this case checking the STASH should do.

    UPDATE: I have to change my mind again... exists would check for a glob...

Re: lazy creation of a hash
by jpl (Monk) on May 12, 2011 at 12:16 UTC
    if (keys(%BigHash))
    will be false iff the hash is empty. Assuming the hash won't be empty after it is loaded, that should do the job.

    Update: moritz is right

    if (%BigHash) {
    should do the same job.
Re: lazy creation of a hash
by LanX (Saint) on May 12, 2011 at 15:32 UTC
    > So my current plan is to use a separate lexical variable to keep track of whether it's been loaded, since that is fastest to check. But it bugs me to use a separate variable, and I still wonder if there is a better (or at least "cooler") way.

    if test for emptiness is not enough for you, you could flag the state of your hash by blessing it per default to a pseudo-package "EMPTY" or "Unloaded".²

    DB<123> print ref \%h HASH DB<124> bless \%h, EMPTY DB<125> print ref \%h EMPTY DB<126> delete ${main::}{h} DB<127> print ref \%h HASH

    but be carefull to destroy the empty hash before loading your data, unfortunately there is no "unbless" mechanism.¹

    There are also tie, tied and untie, but then those pseudo-packages need to exist and to have special methods and might slow down the use of your hash.

    Cheers Rolf

    1) of course you can also bless it to "Loaded" if it doesn't interfere with your plans...

    2) or you just use one package with a method isLoaded() and store the state in a class variable.

    if (\%h->isLoaded() ) {...}

Re: lazy creation of a hash
by MidLifeXis (Monsignor) on May 12, 2011 at 12:51 UTC

    From a basic REPL script:

    %foo; scalar(%foo) 0 %foo = (a=>1, b=>2, c=>3); scalar(%foo) 3/8 %foo = undef; scalar(%foo); 1/8 %foo = (); scalar(%foo); 0

    Does that help?

    --MidLifeXis

Re: lazy creation of a hash
by locked_user sundialsvc4 (Abbot) on May 12, 2011 at 16:55 UTC

    As an aside, you can start with an undefined variable and start referencing a bunch of keys (as though a nested set of “hashes containing hashrefs” with these particular keys already existed, and, miraculously, one will appear.

    When you say “a public variable,” however, I immediately think that you should strictly be using a class for this.   You don’t want code that is dependent upon the arrangement and use of this variable to be scattered willy-and-yob across the system:   you want it to be in one place, and with a clearly defined abstract interface.   These methods specify what the rest of the application wants to do (or wants to know), and (only...) this one package implements how it is done.

    “Efficiency?”   Schmefficiency!   Once you pass a billion clock-cycles per second, no one can hear you scream.   If you need a few more gigabytes, grab ’em at the grocery checkout stand ...   Maintainability trumps all.

      My last $dayjob concerned making servers more efficient. With 200 servers in the farm and counting, adding a little RAM does become expensive, and reducing the need for more hardware by 10% becomes worth an engineer's salary.
Re: lazy creation of a hash
by scorpio17 (Canon) on May 12, 2011 at 13:26 UTC

    Instead of a global hash, consider creating a singleton object having get() and set() methods.

      And exists, and keys, and when I'm all done it might as well be a tied hash since that is the API I want to provide.
Re: lazy creation of a hash
by anonymized user 468275 (Curate) on May 12, 2011 at 13:22 UTC
    limit test access to the first pair of keys ordered by Perl internally, e.g.:
    %h or $object -> init( \%h );
    updated: "each" removed -- owing to comment from Moritz -- this becomes the code for my next idea, to allow Perl to optimise the test its own sweet way - seems likely to be optimised - I even doubt it's really needed to put "scalar" in front.

    One world, one people

      NOOO.
      my %foo = (a => 1, b => 2); for (1..10) { each %foo or print "re-init $_\n"; } __END__ re-init 3 re-init 6 re-init 9

      each uses an iterator tied to the variable. If you use it, you interfer with other (and even your own) usages of that iterator.

      (re the modified form) Right, using it in boolean context allows it to know I really care only if it is empty or not, and don't need to count the actual values.