thalej has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Whenever I use undef on a large hash, everything that follows that undef becomes slower. Here is an example:
######################## #! /usr/local/bin/perl use strict; { my $i = 0; my ; my %testHash2; my $max = 1000000; my $startTime = time; print "Loading hash 1...\n"; for($i = 0; $i <= $max; $i++){ $testHash1{$i} = "THIS IS MY $i PROTEST SONG"; if($i%100000==0){ print "Loaded $i\n"; } } my $stopTime = time; printf("Hash 1 Rows/Sec: %.01f\n",$i/($stopTime-$startTime)); undef %testHash1; $i = 0; $startTime = time; print "Loading hash 2...\n"; for($i = 0; $i <= $max; $i++){ $testHash2{$i} = "THIS IS MY $i PROTEST SONG"; if($i%100000==0){ print "Loaded $i\n"; } } $stopTime = time; printf("Hash 2 Rows/Sec: %.01f\n",$i/($stopTime-$startTime)); } ##############
On my machine, the inserts to %testHash1 takes place at about 250,000 recs/second. The second insert to %testHash2 inserts at a rate of about 9000 recs/second. If I remove the undef %testHash1. The second batch of inserts is just as fast as the first. I have tried emptying the hash in different ways (%testHash1 = ();...) but the results are always the same. What am I doing wrong? Thanks, Thalej

2005-10-12 Retitled by g0n, as per Monastery guidelines
Original title: 'perl undef'

Replies are listed 'Best First'.
Re: Use of undef slowing down perl?
by pboin (Deacon) on Oct 12, 2005 at 13:59 UTC

    Just as a data point, I ran this on my workstation. Perl v5.8.7 built for i486-linux-gnu-thread-multi on Debian.

    I ran it a bunch of times, with the "undef %testHash1;" in and out. The rate varied from 111,000/sec to 250,000/sec, but never anywhere remotely close to your 9000. So, you may have issues with your particular release or platform, but this does not seem to be common to all installations.

    Update:

    Doing some quick reading, I checked out recipie 5.13 in Perl Cookbook. It has a sentence that might shed some light: "Perl already shares keys between hashes, so if you already have a hash with "Apple" as a key, Perl won't need to allocate memory for another copy of "Apple" when you had an entry whose key is "Apple" to another hash".

    Since both keys are about "PROTEST SONG", I'd be most interested if you'd re-run your benchmarks with an entirely different pattern on your second hash key...

    Another Update:

    After seeing Ikegami's results on Windows, I decided to try it on an old NT4 box. Perl version is ActiveState build 813 of Perl 5.8.7. With the undef in place, keys per second were 47619 and 66667. Weird.

Re: perl undef
by ikegami (Patriarch) on Oct 12, 2005 at 14:36 UTC

    I get the same results. I'm using ActivePerl v5.6.1 on Win2k on a machine with little memory available. From the results below, I'm guessing you need less than 100MB free.

    Hash 1 Rows/Sec: 62500.1 Hash 2 Rows/Sec: 4032.3

    But the undef appears to be working nonetheless. I added a <STDIN> at the very top of the program, before the undef and at the very end of the program, and noted the size of perl's process using Task Manager before pressing Enter:

    Start of execution: <2MB After one hash: 123MB After two hashes: 131MB

    Update: On a machine with more memory:

    ActivePerl v5.8.0, WinXP, 640MB available Hash 1 Rows/Sec: 250000.3 Hash 2 Rows/Sec: 250000.3 <2MB 154MB 157MB ActivePerl v5.6.1, WinXP, 640MB available Hash 1 Rows/Sec: 200000.2 Hash 2 Rows/Sec: 200000.2 <2MB 155MB 157MB

    The slowdown is definitely related to virtual memory usage, but I don't know why it only affects the second pass.

Re: Use of undef slowing down perl?
by davidrw (Prior) on Oct 12, 2005 at 13:34 UTC
    (note typo -- need my %testHash1) But yeah, seems weird .. though on my system i got 66666.7 and 71428.6, which are comparable (and the second is faster).

    As i understand it, undef'ing half-free's the memory -- that block is still reserved by perl (e.g. it is still not available to some other process on the machine), but it can now be re-used within the perl application. So one would think that the total memory footprint wouldn't really increase at all during the second insert. How does that relate to run-time speed? I'm not sure ...

    My other observation was that the undef took just about as long (if not longer) to execute as the whole first set of inserts...

    Update: corrected the typo in my typo correction

      Based on the way he uses testHash1, it should be my %testHash1, not my $testHash1 ;-)

      Hello Davidrw, What platform (unix/linux/windows) and version of perl are you running? Thanks for your reply! Thalej
Re: Use of undef slowing down perl?
by diotalevi (Canon) on Oct 12, 2005 at 15:17 UTC

    The next time you do this, try using the Benchmark module. It builds in all the grunt work for comparing two snippets.

      Benchmark isn't useful here. We don't have two snippets here. Benchmark finds the average of mulitple runs of different code. We want the times of two runs of the same code.

      The problem at hand is not a comparison of perfomance of two snippets. The problem at hand is a slowdown (presumably related to virtual-memory) on the second run, even though the second run should be reusing the memory of the first run.

        No, wrong. You want to compare the average and total speed of insertion before and after that undef(%foobar) operation. This is clearly something I'd use Benchmark for. Perhaps there's some big time hit when the hash is first re-used but it goes away as the hash is used more.