powerman has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

Looks like used hashes (and, probably, arrays?) is slowdown some operations (like keys()), even if they was cleaned after using... and only way to workaround this slowdown is use undef() to clean up things.

I've not found description of this effect in docs, so maybe this is a bug ... or please point me to explanation of this feature in docs. :-)

Perl 5.8.2, Glibc 2.2.5, Linux 2.4.22

WBR, Alex.

#!/usr/bin/perl use Benchmark qw(:all); %h_delete = %h_list = %h_undef = (0 .. 50000); delete @h_delete{keys %h_delete}; %h_list = (); undef %h_undef; timethese 10000, { delete => '1 for keys %h_delete', list => '1 for keys %h_list', undef => '1 for keys %h_undef', }; __END__ Benchmark: timing 10000 iterations of delete, list, undef... delete: 5 wallclock secs ( 2.43 usr + 0.00 sys = 2.43 CPU) @ 41 +15.23/s (n=10000) list: 5 wallclock secs ( 2.44 usr + 0.00 sys = 2.44 CPU) @ 40 +98.36/s (n=10000) undef: 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) @ 50 +0000.00/s (n=10000) (warning: too few iterations for a reliable count)

Replies are listed 'Best First'.
Re: undef speedup ?!
by blokhead (Monsignor) on Feb 08, 2004 at 19:09 UTC
    I'm a Perl-internals newbie, so the following may be invalid, inaccurate, or just plain wrong.. but here's what seems to be going on, from what I can tell:
    %h_delete = %h_list = %h_undef = (0 .. 50000); delete @h_delete{keys %h_delete}; %h_list = (); undef %h_undef; use Devel::Peek; Dump \%h_delete; Dump \%h_list; Dump \%h_undef;

    Output:

    All the hashes have KEYS = 0, and you would think Perl would be smart enough to notice this when you call keys. The only difference in the 3rd one (%h_undef) is that it has no ARRAY pointer (with MAX set accordingly). I would guess that keys travels that entire array looking for keys, and that's where the slowdown is. This seems to be the case from looking at Perl_hv_iternext_flags. A definitive explanation from someone more versed in the internals is welcome ;)

    I don't see why the KEYS = 0 case couldn't be optimized; perhaps you should file a bug and see!

    blokhead

      "I'm a Perl-internals newbie..."

      Not soo NB, nice answer. ;-P

      It seems that this is a optimization bug, or some optimization that wasn't done yet.

      I recomend to send some bug report to perlbug.

      Graciliano M. P.
      "Creativity is the expression of the liberty".

      Thanks!

      I've tested this issue using perl 5.6.1 with exactly same resuts. I've submitted perlbug - let's see what they will reply...

      WBR, Alex.

Re: undef speedup ?!
by davido (Cardinal) on Feb 09, 2004 at 04:16 UTC
    Just a couple things I thought might be interesting to point out...

    First, if you turn on warnings with "use warnings;", you'll find that your declaration of those hashes has an uneven number of elements. You can correct that by changing it from (0..50000) to (1..50000).

    Next, if you turn on strictures with "use strict;", and add my (%h_delete, %h_list, %h_undef); at the appropriate place near the top of the script, the benchmark timings all start looking a lot more like the speed of your 'undef' test.

    So this appears to be an artifact relevant only when wielding global variables.


    Dave

      add my (%h_delete, %h_list, %h_undef); at the appropriate place near the top of the script,
      That would not be a smart thing to do. You do know what the effect of my is, don't you? It makes its argument(s) lexical variables. Meaning that those variables are visible only in the current block, and not from other blocks or files. Which means, not from the Benchmark.pm file either. The Benchmarking code will access the package variables %h_delete, %h_list, %h_undef - just like the program would if you leave out the my.

      The important lesson to be learned is that one should never blindly apply my to variables. Always know what you are doing, and always consider the consequences.

      Abigail

        Well, you got me that time. Of course I know what 'my' does. I didn't plan for this format to create a completely different lexical scope:

        timethese 10000, { this => ' ...... ', that => ' ...... ' };

        When benchmarking I'm accustomed to using the sub format, like this:

        timethese 10000, { this => sub { ..... }, that => sub { ..... } }

        ...which follows a more natural lexical scoping, where the subs are still executed within the same scope as the lexical block in which they're called.

        For the record, the following code now matches the OP's observed behavior:

        use strict; use warnings; use Benchmark; my (%h_delete, %h_list, %h_undef); %h_delete = %h_list = %h_undef = (1 .. 50000); delete @h_delete{keys %h_delete}; %h_list = (); undef %h_undef; timethese 10000, { delete => sub{ 1 for keys %h_delete}, list => sub{ 1 for keys %h_list}, undef => sub{ 1 for keys %h_undef}, };


        Dave

Re: undef speedup ?!
by Anonymous Monk on Feb 09, 2004 at 03:32 UTC

    I don't recall an exact discussion of this but there are some related comments (probably in the Camel book) that using delete keeps the stucture of the object allocated (and saves time if you need to refill the structure) whereas undef destroys the structure.

    If you make the (reasonable for this datatype) assumtion that keys does a linear scan of the stucture to find valid keys then the behaviour is explicable and even expected.