undef speedup ?!

powerman has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

Looks like used hashes (and, probably, arrays?) is slowdown some operations (like keys()), even if they was cleaned after using... and only way to workaround this slowdown is use undef() to clean up things.

I've not found description of this effect in docs, so maybe this is a bug ... or please point me to explanation of this feature in docs. :-)

Perl 5.8.2, Glibc 2.2.5, Linux 2.4.22

WBR, Alex.

#!/usr/bin/perl
use Benchmark qw(:all);

%h_delete = %h_list = %h_undef = (0 .. 50000);

delete @h_delete{keys %h_delete};
%h_list = ();
undef %h_undef;

timethese 10000, {
    delete  => '1 for keys %h_delete',
    list    => '1 for keys %h_list',
    undef   => '1 for keys %h_undef',
    };

__END__

Benchmark: timing 10000 iterations of delete, list, undef...
    delete:  5 wallclock secs ( 2.43 usr +  0.00 sys =  2.43 CPU) @ 41
+15.23/s (n=10000)
      list:  5 wallclock secs ( 2.44 usr +  0.00 sys =  2.44 CPU) @ 40
+98.36/s (n=10000)
     undef:  0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU) @ 50
+0000.00/s (n=10000)
            (warning: too few iterations for a reliable count)
[download]

Comment on undef speedup ?! Download Code

Replies are listed 'Best First'.
Re: undef speedup ?! by blokhead (Monsignor) on Feb 08, 2004 at 19:09 UTC
I'm a Perl-internals newbie, so the following may be invalid, inaccurate, or just plain wrong.. but here's what seems to be going on, from what I can tell: `%h_delete = %h_list = %h_undef = (0 .. 50000); delete @h_delete{keys %h_delete}; %h_list = (); undef %h_undef; use Devel::Peek; Dump \%h_delete; Dump \%h_list; Dump \%h_undef;` [download] Output: Read more... (1033 Bytes) All the hashes have KEYS = 0, and you would think Perl would be smart enough to notice this when you call keys. The only difference in the 3rd one (%h_undef) is that it has no ARRAY pointer (with MAX set accordingly). I would guess that keys travels that entire array looking for keys, and that's where the slowdown is. This seems to be the case from looking at `Perl_hv_iternext_flags`. A definitive explanation from someone more versed in the internals is welcome ;) I don't see why the KEYS = 0 case couldn't be optimized; perhaps you should file a bug and see! blokhead	[reply] [d/l] [select]
Re: Re: undef speedup ?! by gmpassos (Priest) on Feb 08, 2004 at 21:31 UTC
"I'm a Perl-internals newbie..." Not soo NB, nice answer. ;-P It seems that this is a optimization bug, or some optimization that wasn't done yet. I recomend to send some bug report to perlbug. Graciliano M. P. "Creativity is the expression of the liberty".	[reply]
Re: Re: undef speedup ?! by powerman (Friar) on Feb 08, 2004 at 21:56 UTC
Thanks! I've tested this issue using perl 5.6.1 with exactly same resuts. I've submitted perlbug - let's see what they will reply... WBR, Alex.	[reply]
Re: undef speedup ?! by davido (Cardinal) on Feb 09, 2004 at 04:16 UTC
Just a couple things I thought might be interesting to point out... First, if you turn on warnings with "`use warnings;`", you'll find that your declaration of those hashes has an uneven number of elements. You can correct that by changing it from `(0..50000)` to `(1..50000)`. Next, if you turn on strictures with "`use strict;`", and add `my (%h_delete, %h_list, %h_undef);` at the appropriate place near the top of the script, the benchmark timings all start looking a lot more like the speed of your 'undef' test. So this appears to be an artifact relevant only when wielding global variables. Dave	[reply] [d/l] [select]
Re: undef speedup ?! by Abigail-II (Bishop) on Feb 09, 2004 at 11:28 UTC
add my (%h_delete, %h_list, %h_undef); at the appropriate place near the top of the script, That would not be a smart thing to do. You do know what the effect of `my` is, don't you? It makes its argument(s) lexical variables. Meaning that those variables are visible only in the current block, and not from other blocks or files. Which means, not from the `Benchmark.pm` file either. The Benchmarking code will access the package variables `%h_delete, %h_list, %h_undef` - just like the program would if you leave out the `my`. The important lesson to be learned is that one should never blindly apply `my` to variables. Always know what you are doing, and always consider the consequences. Abigail	[reply] [d/l] [select]
Re: Re: undef speedup ?! by davido (Cardinal) on Feb 09, 2004 at 16:57 UTC
Well, you got me that time. Of course I know what 'my' does. I didn't plan for this format to create a completely different lexical scope: `timethese 10000, { this => ' ...... ', that => ' ...... ' };` [download] When benchmarking I'm accustomed to using the sub format, like this: `timethese 10000, { this => sub { ..... }, that => sub { ..... } }` [download] ...which follows a more natural lexical scoping, where the subs are still executed within the same scope as the lexical block in which they're called. For the record, the following code now matches the OP's observed behavior: `use strict; use warnings; use Benchmark; my (%h_delete, %h_list, %h_undef); %h_delete = %h_list = %h_undef = (1 .. 50000); delete @h_delete{keys %h_delete}; %h_list = (); undef %h_undef; timethese 10000, { delete => sub{ 1 for keys %h_delete}, list => sub{ 1 for keys %h_list}, undef => sub{ 1 for keys %h_undef}, };` [download] Dave	[reply] [d/l] [select]
Re: undef speedup ?! by Abigail-II (Bishop) on Feb 09, 2004 at 17:39 UTC
Re: undef speedup ?! by Anonymous Monk on Feb 09, 2004 at 03:32 UTC
I don't recall an exact discussion of this but there are some related comments (probably in the Camel book) that using delete keeps the stucture of the object allocated (and saves time if you need to refill the structure) whereas undef destroys the structure. If you make the (reasonable for this datatype) assumtion that keys does a linear scan of the stucture to find valid keys then the behaviour is explicable and even expected.	[reply]