agname has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have been using state to cache a hashref and I am unsure whether this is good usage. It does however work.

Here is an example of usage:

#!/usr/bin/perl use v5.10; use strict; use warnings; sub example_sub { my $cfg = cfg_cache(); say $cfg->{x}; } sub cfg_cache { state $cfg = shift; return $cfg; } my $cfg = { x => "cat"}; cfg_cache($cfg); ## "cat" gets printed example_sub(); ## "mouse" gets printed. $cfg->{x} = "mouse"; example_sub(); ## but if I reset $cfg "mouse" still gets printed $cfg = { x => "rat"}; example_sub();

example_sub() always gets whatever the current contents of $cfg should be as long as $cfg is not reset.

Without reading from cfg_cache() example_sub() has no access to the $cfg hashref.

The state perldoc says "variables will never be reinitialized" but it doesn't cover what happens if the variable is a reference.

Is this good practice?

I am using Perl 5.18 on Debian Unstable but using state this ways works from v5.10

Replies are listed 'Best First'.
Re: caching hashrefs in state
by Laurent_R (Canon) on Apr 22, 2014 at 18:58 UTC

    The state perldoc says "variables will never be reinitialized" but it doesn't cover what happens if the variable is a reference.

    So long as you have a reference which can be accessed through a subroutine, the data referenced by it will be accessible even if it is an anonymous hash, since the garbage collector will see that existing reference.

    Personally, I tend to prefer using closures to obtain the same effect, but this has a lot to do with the fact that several of the environments where I have to work still run Perl 5.8, in which the state qualifier was not available. I do not see why using the state qualifier would not be good practice.

Re: caching hashrefs in state
by Anonymous Monk on Apr 22, 2014 at 19:26 UTC
    Is this good practice?

    I don't see anything directly wrong with it. It's just not clear to me from what you write whether some other construct might be more fitting in your situation. Maybe this is an XY Problem - what exactly are you trying to achieve by "caching" the config hashref like this? How do you want your code to behave when individual config values are changed, or the entire config is replaced?

    state $cfg never gets re-initialized, so it will always refer to the same hash, and cfg_cache() will always return that same hashref. The contents of the config hash being referred to can still be freely manipulated by anyone holding a reference to the original hash (that's what you're seeing when your code prints "mouse") or anyone calling cfg_cache().

    Perhaps you have over-simplified your example code a bit but I don't see the difference between your method and using a (readonly) global variable?

    If you're interested in protecting the config hash against (accidental) changes, see Readonly and/or lock_keys from Hash::Util.

      Just to expand on what Laurent_R and Anonymonk have said, I would say the OPed question has less to do with the peculiarities of state variables than with the fecund peculiarities of references.

      From the OP:
      my $cfg = { x => "cat"};

      cfg_cache($cfg);

      When these two statements from the OPed code finish execution, there are two references to the underlying  { x => "cat" } anonymous hash object: one in the  $cfg lexical variable in the main-line code, another in the  $cfg state variable in the scope of the  cfg_cache() function. The underlying object may be accessed and changed via either reference.

      The contents of the config hash being referred to can still be freely manipulated by anyone holding a reference to the original hash (that's what you're seeing when your code prints "mouse") or anyone calling cfg_cache().

      This point deserves to be emphasized. The intent of the practice exemplified in the OPed code may be to achieve some kind of immutable data structure, but that's not the case. This can be seen from case E of the example below in which the original hash object is altered using the equivalent of cfg_cache().

      c:\@Work\Perl\monks>perl -wMstrict -le "use feature 'state'; ;; my $hashref = { x => 'cat' }; S($hashref); ;; print 'A1: ', S('trash')->{x}; print 'A2: ', $hashref->{x}; ;; $hashref->{x} = 'mouse'; print 'B1: ', S('junk')->{x}; print 'B2: ', $hashref->{x}; ;; undef $hashref; print 'C1: ', S('dreck')->{x}; print 'C2: ', $hashref->{x}; ;; $hashref = { x => 'snark' }; print 'D1: ', S('bilge')->{x}; print 'D2: ', $hashref->{x}; ;; S('cruft')->{x} = 'oliphaunt'; print 'E1: ', S('fudge')->{x}; print 'E2: ', $hashref->{x}; ;; sub S { state $ref = shift; return $ref; } " A1: cat A2: cat B1: mouse B2: mouse C1: mouse Use of uninitialized value in print at -e line 1. C2: D1: mouse D2: snark E1: oliphaunt E2: snark
      Is this good practice?

      It's always good practice to understand references if you use them. :)

Re: caching hashrefs in state
by mhearse (Chaplain) on Apr 22, 2014 at 22:08 UTC
    I tend to roll even small scripts into a module. And set needed values as module attributes. These of course persist across methods. And they are only initiated once.

    my $self = shift; $self{attr_name} = your_data_structure;
Re: caching hashrefs in state
by Anonymous Monk on Apr 23, 2014 at 02:40 UTC

    Is this good practice?

    not really -- if you're abstracting away a configuration singleton, don't allow direct access to the underlying hash (the state-d hash)

    if you're going to allow direct access, just make it  our %hash

    Also, like the other guy said , state $cfg never gets re-initialized,

    the  state $cfg inside sub cfg_cache has nothing to do with the  my $cfg outside
    $state_cfg doesn't get reinitialized, it holds a reference to $my_cfg, if you assign $my_cfg a new reference $state_cfg will not be affected, $state_cfg will still hold the original reference

    The state perldoc says "variables will never be reinitialized" but it doesn't cover what happens if the variable is a reference.

    Sure it does, a reference is a still just a variable, it won't be reinitialized

    $ perl -le " use v5.10; sub f { state $f = $_[0]; warn qq{ $f # @_ }; + $f; } f({},$_) for 1 .. 4 " HASH(0x3f9b34) # HASH(0x3f9b34) 1 at -e line 1. HASH(0x3f9b34) # HASH(0x3f9b44) 2 at -e line 1. HASH(0x3f9b34) # HASH(0x3f9b94) 3 at -e line 1. HASH(0x3f9b34) # HASH(0x3f9b44) 4 at -e line 1.
    state $f, once initialized deoesn't get reinitialized ,  state $f = $_[0]; only executes ONCE the very first time

    However multiple assignments are multiple

    $ perl -le " use v5.10; sub f { state $f; $f = $_[0]; warn qq{ $f # @ +_ }; $f; } f({},$_) for 1 .. 4 " HASH(0x3f9b44) # HASH(0x3f9b44) 1 at -e line 1. HASH(0x3f9b54) # HASH(0x3f9b54) 2 at -e line 1. HASH(0x3f9ba4) # HASH(0x3f9ba4) 3 at -e line 1. HASH(0x3f9b44) # HASH(0x3f9b44) 4 at -e line 1.
Re: caching hashrefs in state
by kcott (Archbishop) on Apr 23, 2014 at 11:02 UTC

    G'day agname,

    Welcome to the monastery.

    The "state $cfg" is assigned the "{ x => "cat"}" hashref — it's a reference that will print something like "HASH(0xffffff)". Changing the value of the key "x", doesn't change the reference. The hashref "{ x => "rat" }" is a completely different value (perhaps "HASH(0x777777)") — whatever you do to this has no bearing whatsoever on the "state $cfg". Consider this example:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my $cfg = { x => 'cat' }; say 'INITIAL: ', $cfg; example_sub(); $cfg->{x} = 'mouse'; say 'CHANGE1: ', $cfg; example_sub(); $cfg = { x => 'rat' }; say 'CHANGE2: ', $cfg; example_sub(); sub example_sub { say cfg_cache()->{x}; } sub cfg_cache { state $cfg = $cfg; say 'CACHED: ', $cfg; return $cfg; }

    Output:

    INITIAL: HASH(0x7ff0f8802ee8) CACHED: HASH(0x7ff0f8802ee8) cat CHANGE1: HASH(0x7ff0f8802ee8) CACHED: HASH(0x7ff0f8802ee8) mouse CHANGE2: HASH(0x7ff0f8829c38) CACHED: HASH(0x7ff0f8802ee8) mouse

    Note how CHANGE2 is a different reference from all the others.

    This type of code is likely to become bug-ridden and a maintenance nightmare: this is not a good practice!

    If you want all changes involving $cfg to be reflected in what's returned from the cache, don't store the hashref reference, store a reference to $cfg itself. Consider this change to cfg_cache():

    sub cfg_cache { state $cfg_ref = \$cfg; say 'CACHED: ', $$cfg_ref; return $$cfg_ref; }

    Output:

    INITIAL: HASH(0x7fd539802ee8) CACHED: HASH(0x7fd539802ee8) cat CHANGE1: HASH(0x7fd539802ee8) CACHED: HASH(0x7fd539802ee8) mouse CHANGE2: HASH(0x7fd539829c38) CACHED: HASH(0x7fd539829c38) rat

    Now the final CACHED reference is the same as the CHANGE2 reference. Given the "state $cfg_ref" value (after dereferencing) is identical to the "my $cfg" value, whose scope is the entire script, its use in this fashion is questionable as it appears to serve no useful purpose — why not just use the "my $cfg".

    If, on the other hand, you wanted to cache an initial value and allow no changes at all, create a new hashref with the inital keys and values of $cfg:

    sub cfg_cache { state $cfg_ref = { %$cfg }; say 'CACHED: ', $cfg_ref; return $cfg_ref; }

    Output:

    INITIAL: HASH(0x7fd88c002ee8) CACHED: HASH(0x7fd88c029c50) cat CHANGE1: HASH(0x7fd88c002ee8) CACHED: HASH(0x7fd88c029c50) cat CHANGE2: HASH(0x7fd88c029c38) CACHED: HASH(0x7fd88c029c50) cat

    Now the CACHED reference is the same throughout; it's also different from whatever the others are. This is potentially a better use than the others.

    If you provided a more concrete idea of the context in which you've chosen to use state variables, we could probably provide a better response with respect to whether you're employing good practices.

    -- Ken