In a recent discussion about dynamic variables (${'foo'.$i} in the activeperl@listserv.activestate.com mailinglist someone said

dynamic scalars can sometimes be a better solution than hashes (in term of speed and memory usage) if you don't need to be able to retrieve their names (with a key or an each statement), but only want them to be accessible when called with the proper variable name.

I thought he was wrong because package variables also live in an (almost) ordinary hash. ( Try to print join("\n", keys %main:: ),"\n";) And since this hash doesn't contain directly the scalars, but GLOBs, the memory footprint is actually bigger if you use plenty of "dynamic" variables, that if you use an ordinary hash.

But did not want to post that without support so I tried this script:

for(my $i=0; $i < 100000; $i++) { $hash{'foo'.$i} = ''; # using a hash # ${'foo'.$i} = ''; # using dynamic variables } print "done\n"; <STDIN>;
And the difference was even bigger than I expected (well especialy thanks to the fact the values were so small.) The hash solution needed 15,292KB of RAM, while the dynamic variables needed 41,380KB. (According to the Task Manager in my Win2k Server)

The speed of the dynamic variables was also worse than that of the hash:

#!perl use Benchmark; my %hash; for(my $i=0; $i < 1000; $i++) { $hash{'foo'.$i} = 0; ${'foo'.$i} = 0; } sub useHash { my %hash; for(my $i=0; $i < 1000; $i++) { $hash{'foo'.$i}++; } } sub useVars { my %hash; for(my $i=0; $i < 1000; $i++) { ${'foo'.$i} ++; } } timethese 1000, { useHash => \&useHash, useVars => \&useVars, }; __END__ Benchmark: timing 1000 iterations of useHash, useVars... useHash: 3 wallclock secs ( 2.68 usr + 0.00 sys = 2.68 CPU) @ 372.58/s (n=1000) useVars: 4 wallclock secs ( 3.79 usr + 0.00 sys = 3.79 CPU) @ 264.20/s (n=1000)

So the dynamic variables are not only more dangerous, but also slower and more memory-hungry. Not that the other reasons not to use them would not be enough, but it can't hurt to (try to) destroy one more myth ;-)

Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
   -- Rick Osborne

P.S.: Yes I know I could have used an array, but I wanted to give the dynamic variables a fair chance. :-)

Replies are listed 'Best First'.
Re: Symbolic refs aka. dynamic variables again
by eduardo (Curate) on Apr 24, 2003 at 19:41 UTC
    Ok. I'm confused. I hope one of the more knowledgable internals monks is able to explain why storing a variable in the symbol table is so much more expensive than in a hash. My gut instinct is that it has something to do with the fact that storing something in the symbol table involves more overhead due to reserving a GLOB or something along those lines, but I just tested it on my box, and it's 3 times the memory for a symbolic ref than it is for a hash value. Is it something along those lines, is the fact that the container is the symbol table instead of an actual "container object" causing perl to do more work that a pedestrian monk such as myself wouldn't usually encounter? Am I at least on the right track?

      I'm not going to pretend deep knowledge of Perl internals, but the typeglob structure allocates sufficient space to point to ...

      • a scalar value - $foo
      • an array value - @foo
      • a hash value - %foo
      • a subroutine - &foo
      • a filehandle - foo
      • a format - foo

      Whereas each hash key points directly to a scalar value (SV)

        you forgot one slot in the glob, the glob itself - *foo

        ~Particle *accelerates*

      Well, for one thing, variables in the symbol table are globals. It has long been known that lexically-scoped (my) variables use less memory than globals because they don't have to do as much.
Re: Symbolic refs aka. dynamic variables again
by particle (Vicar) on Apr 24, 2003 at 20:13 UTC

    i think your benchmark is a little off, since it's comparing global scalars to a lexical hash. hopefully this one's more accurate (but i'm no expert.)

    Update:

    here, global vars win in speed, as you quoted. i did not test memory requirements. of course, i meant that lexicals win... what was i looking at when i typed that?

    #!perl use Benchmark; my %l_hash; sub use_global_hash { ++$hash{ 'foo' . $_ } for 0 .. 1000 } sub use_global_vars { ++${ 'foo' . $_ } for 0 .. 1000 } sub use_lexical_hash { ++$l_hash{ 'foo' . $_ } for 0 .. 1000 } timethese -10, { use_global_hash => \&use_global_hash, use_global_vars => \&use_global_vars, use_lexical_hash => \&use_lexical_hash, }; __END__ > t-b-symref.pl Benchmark: running use_global_hash, use_global_vars, use_lexical_hash +for at least 10 CPU seconds... use_global_hash: 11 wallclock secs (10.58 usr + 0.00 sys = 10.58 CPU) + @ 744.83/s (n=7884) use_global_vars: 10 wallclock secs (10.54 usr + 0.02 sys = 10.56 CPU) + @ 561.35/s (n=5925) use_lexical_hash: 11 wallclock secs (10.65 usr + 0.00 sys = 10.65 CPU +) @ 794.74/s (n=8460)

    ~Particle *accelerates*

      Nah - your results are way too close. On slower hardware lexicals are clearly fastest, globals come in second and symbolic refs in last place. This is all easily explained in terms of involved opnodes. Symbolic refs add a handful of nodes, and globals are one opnode heavier than lexicals.

      Benchmark: running use_global_hash, use_global_vars, use_lexical_hash, + each for at least 10 CPU seconds... use_global_hash: 11 wallclock secs (10.45 usr + 0.00 sys = 10.45 CPU) + @ 112.63/s (n=1177) use_global_vars: 11 wallclock secs (10.55 usr + 0.00 sys = 10.55 CPU) + @ 83.32/s (n=879) use_lexical_hash: 11 wallclock secs (10.48 usr + 0.00 sys = 10.48 CPU +) @ 120.71/s (n=1265) Rate use_global_vars use_global_hash use_lexical_ +hash use_global_vars 83.3/s -- -26% +-31% use_global_hash 113/s 35% -- + -7% use_lexical_hash 121/s 45% 7% + --

      > here, global vars win in speed, as you quoted.

      I think you meant "global vars LOSE in speed". Using global dynamic variables you did only 561.35 increments a second, while with the global hash you did 744.83 and with a lexical hash 794.74 increments per second.

      If you use timethese() with a negative first argument, the times per se are useless. You have to divide the number of iterations by the times to get anything meaningfull.

      Anyway you are right I should have included the global hash in the benchmark as well.

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature