fizbin has asked for the wisdom of the Perl Monks concerning the following question:

(I wish there were a synonym for array that began with "p")

In answering this node, I ran across the following odd performance behavior. I'd like someone who seriously understands perl internals to help me figure out what's going on.

The basic puzzle is this: it appears that I can construct two arrays which are identical and yet doing [0] on one array is much faster (by 10%) than doing [0] on the other. However, which array has faster access changes depending on which element I access and on some details of how the array is constructed.

So here's the code:

# This data format is related to the original problem # read this as "17 copies of 43", "33 copies of 21", etc. my $str = shift || "17:43:33:21:23:19:27:6"; my @a1 = (); my @a2 = (); my @a = split /:/, $str; while ( my ( $p, $ad ) = splice @a, 0, 2 ) { push @a1, map {$ad} (1..$p); push @a2, ($ad) x $p; } use Benchmark 'cmpthese'; print "Accessing element 0:\n"; cmpthese(-1, { made_by_map => sub { $a1[0]; }, made_by_x => sub { $a2[0]; } }); print "\nAccessing element 50:\n"; cmpthese(-1, { made_by_map => sub { $a1[50]; }, made_by_x => sub { $a2[50]; } }); print "\nAccessing element 99:\n"; cmpthese(-1, { made_by_map => sub { $a1[99]; }, made_by_x => sub { $a2[99]; } }); print "\nAccessing random element:\n"; cmpthese(-1, { made_by_map => sub { $a1[rand 100]; }, made_by_x => sub { $a2[rand 100]; } });
And here's the results of running that test code without any arguments:
Accessing element 0: Rate made_by_x made_by_map made_by_x 7554581/s -- -11% made_by_map 8479758/s 12% -- Accessing element 50: Rate made_by_x made_by_map made_by_x 7788171/s -- -4% made_by_map 8095273/s 4% -- Accessing element 99: Rate made_by_map made_by_x made_by_map 8058884/s -- -9% made_by_x 8882975/s 10% -- Accessing random element: Rate made_by_map made_by_x made_by_map 1724352/s -- -2% made_by_x 1758653/s 2% --
Now, if the performance figures were all within 1-2%, I might be able to dismiss the result as noise. But clearly, they aren't. They're large differences, and the results are consistent each time I run the code. (I'm using perl 5.8.2 as distributed with cygwin)

Running the code with different arguments yields different results. For example, try:

perl test.pl 100:100
or
perl test.pl `perl -e 'print join(q[:], (1) x 200)'`
Update: Never mind, folks - it's just noise. Nothing to see here; move along.
-- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Replies are listed 'Best First'.
Re: Perl array performance puzzle
by dave_the_m (Monsignor) on Jul 30, 2004 at 16:13 UTC
    It is just noise. Or rather, stuff like the effects of cache alignment etc. Try swapping a1 and a2 round in the benchmark calls and you'll find you get a whole new set numbers, with no obvious correlation of one array consistenly being faster than the other.

    Dave.

    D'oh!
    by fizbin (Chaplain) on Jul 30, 2004 at 16:22 UTC
      Mea culpa.

      You're right - it is just noise, though I don't get different numbers from swapping things around in the code. I do however get very different numbers based on what other programs are open at the same time on my laptop. Before, I had just been running the same program in a loop with the laptop otherwise idle.

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
Re: Perl array performance puzzle
by BrowserUk (Patriarch) on Jul 30, 2004 at 16:29 UTC

    And if swapping the ordering of the tests doesn't convince you, then try constructing both arrays using the same method.

    You'll find that the benchmark shows a difference even when they are both constructed the same way.

    It's worth bearing in mind that given the huge numbers of iterations being performed, even the largest (12%) difference shown, represents a difference of 0.000,000,014 of a second which hardly makes it a "large difference".


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: Perl array performance puzzle
by waswas-fng (Curate) on Jul 30, 2004 at 16:24 UTC
    Map will preallocate memory required for the array in a larger chunk than pushing into an array. It depends on how much real memory is assigned and allocated on your system and how fragmented it is. What does pre sizing the array for both map and push type builds do to your benchmark?

    @a1= (); @a2= (); $#a1 = 1000000; $#a2 = 1000000;


    -Waswas

      I would expect that to only change the time it takes to build the array, not to access an element.

      ----
      send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.