Itatsumaki has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
In a recent node there was some discussion about the different ways to get the count of the number of elements in an array. In that thread I expressed my personal preference for scalar(@array) over $#array + 1 or $size = @array. I didn't have any performance timings, so I figured I should see if there really is a difference. This is my timing code:

use strict; use Benchmark; my @array = (1..10000000); timethese(100000000, { 'scalar' => \&size_scalar, 'index' => \&size_index, 'context' => \&size_context } ); sub size_context { my $val1 = @array; } sub size_index { my $val2 = $#array + 1; } sub size_scalar { my $val3 = scalar(@array); }

The results (formatted to look a bit nicer):
context: 30.52 usr + 0.00 sys = 30.52 CPU @ 3277076.85/s<br> index: 42.11 usr + 0.00 sys = 42.11 CPU @ 2374732.84/s<br> scalar: 53.44 usr + 0.00 sys = 53.44 CPU @ 1871327.52/s

So, two questions here. First, is my test doing a reasonable job of timing? I don't do benchmarks like this often, so I'm worried I've hashed it up. Second, is there any obvious reason why the difference? Is scalar(@array) is slower because it requires a function call?

-Tats

Replies are listed 'Best First'.
•Re: Timing of Array-Size Determination Methods
by merlyn (Sage) on Jun 02, 2003 at 16:42 UTC
    Your benchmark is sound, but look at what it revealed... sure, you can save 1.5 microseconds by dropping the word "scalar", but even if you executed that one million times, that's still less than two seconds of time, and if it helps a programmer figure out what the code is doing to save those two seconds, you're still ahead.

    Use what's clear. Do not prematurely optimize. Let your default be to optimize for programmer understanding.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      While I completely agree with your sentiment, sometimes it's not premature. Prematurely scolding about premature optimization is, well, premature.

      If a snippet of code is utilized in some tight loop, perhaps it IS warranted, and knowing the difference was useful information. It's only 1.5 microseconds, but it's 50% of that particular comparison after Benchmark amortizes the sub calls. That MAY be significant to SOME loops.

      While one really shouldn't rely on Perl for hyper-tight loops, sometimes you still want to knock a second or two off of an otherwise sound algorithm. And in the case of @foo vs scalar @foo, I'd say any usefully-competent Perl maintainer will know what's meant by the code.

      --
      [ e d @ h a l l e y . c c ]

      Fair enough, but in my mind:

      1. I would have never known if there was a significant difference w/o doing the benchmark
      2. Even if the difference isn't of practical importance, understanding why it happens can be a great learning tool

      -Tats
        Given how small the amounts of time are that we're talking about here, I don't think you could call this difference significant, except in the statistical sense.
Re: Timing of Array-Size Determination Methods
by Juerd (Abbot) on Jun 02, 2003 at 16:56 UTC

    sub size_index { my $val2 = $#array + 1; }

    Just... don't... do... that!! Express what you *mean* or use some obfuscational language like C, (non-parrot) assembler or Java. If you want to express "if not" then type "if not". If you mean "unless", then use "unless". If you mean array size then DON'T say "index of the last element plus one".

    Is scalar(@array) is slower because it requires a function call?

    Yes. But don't let its being slower stop you from using it. 1.8 million per second or 3.2 million per second -- if that matters, Perl is not the right tool for the job.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      I quote perldoc -f scalar:
      Because "scalar" is unary operator
      So no it's not a function call at all, even if looks like one. (Just like split.)
      $ perl -MO=Terse -e'scalar @_' LISTOP (0x8133508) leave [1] OP (0x8133370) enter COP (0x8133400) nextstate UNOP (0x81333c0) scalar UNOP (0x8133630) rv2av [1] SVOP (0x8133540) gv GV (0x8124334) *_ -e syntax OK $ perl -MO=Terse -e'$#_+1' LISTOP (0x8133440) leave [1] OP (0x8128040) enter COP (0x8133400) nextstate BINOP (0x8133508) add [2] UNOP (0x81333c0) av2arylen UNOP (0x8133630) rv2av [1] SVOP (0x8133540) gv GV (0x8124334) *_ SVOP (0x8133370) const IV (0x8132bdc) 1 -e syntax OK
      Certainly looks like $#arr + 1 loses that argument as well. Not that it mattered in the first place..

      Makeshifts last the longest.

        All functions in perlfunc are in fact operators, I think. I believe the term "unary operator" is also used for ($)-prototyped functions, like how "list operator" is for (@)-prototyped (or prototypeless) functions.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Timing of Array-Size Determination Methods
by hardburn (Abbot) on Jun 02, 2003 at 16:41 UTC

    is my test doing a reasonable job of timing?

    Nothing pops out at me.

    Is scalar(@array) is slower because it requires a function call?

    That would be my guess, yes. It has to build the entire @_ variable from your huge array input. That alone is going to slow it down.

    But at the level of almost 2 million per second in the slowest case, I really wouldn't worry about it (:

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

Re: Timing of Array-Size Determination Methods
by BrowserUk (Patriarch) on Jun 03, 2003 at 12:32 UTC

    One way of demonstrating cause of the apparent inefficiency of the $#... compared to scalar @... is to reverse the premise. What if what you wanted was the last index and not the size? Then you could use the same 3 methods to arrive at your solution.

    use strict; use Benchmark qw[cmpthese]; my @array = (1..100); cmpthese(1000000, { 'scalar' => \&last_index_scalar, 'index' => \&last_index, 'context'=> \&last_index_context }); sub last_index_context { my $val1 = @array-1; } sub last_index { my $val2 = $#array; } sub last_index_scalar { my $val3 = scalar(@array)-1; } __DATA__ P:\test>test Rate scalar context index scalar 207641/s -- -4% -17% context 217014/s 5% -- -14% index 251509/s 21% 16% --

    Now it becomes fairly obvious that it isn't that $# is "so slow", but that the process of doing an extra addition (or subtraction) in a very tight loop of an otherwise very fast operation, is enough to completely distort the results. It also demonstrates that knowing which is the "right way" has benefits beside clarity.

    This single micro-optimisation isn't going to make a huge difference in 98% of programs. However, combining the effects of this one, with half a dozen others--passing references to arrays, rather than the arrays themselves; using hash for lookups rather than grepping an array; avoiding excessive backtracking in your regexes; moving as much of the processing into the built-ins rather using explicit loops; and a whole host of other, (individually micro-), optimisations--spread liberally through an application that needs to process large volumes of data very fast in tight loops, and the effect can become significant.

    I'm talking about the combined effect of the savings made by simply using the correct technique or construct in the right place, rather than resorting to obscure or "tricky" optimisations

    Overall, it can mean the difference between being able to use perl for that application, rather than being forced to move to something icky like C just for the sake of simply not knowing.

    BTW. I'm mightily impressed with your allocation of a 10 million element array, and your patience while it was allocated :). Allocating and initialising approx. 65+ MBs of data takes a while on my old machine, but in truth, I doubt it made a jot of difference to the outcome of your benchmark over using a 100 element, or even a 2-element array, as I beleive, though I haven't verified, that the size of the array (and by implcation the last index) is retrieved from storage rather than scanned for or calculated.

    One possible explaination for why $#... is slightly slower than scalar @.... (implicit or explicit), is that there probably has to be additonal(sic) internal calculation to arrive at $#... Probably 2 actually. One to adjust the stored size by -1 to allow for the zeroth element, and the second to add any setting of the deprecated $[ for legacy reasons. The fact that internally, perl has to subtract 1 and then add $[ (or even test it for non-zero, though that probably would be a wasted optimisation!), and then you have to add on back, Means overall 3 extra instruction at least by using the 'wrong' method.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Timing of Array-Size Determination Methods
by Itatsumaki (Friar) on Jun 02, 2003 at 19:13 UTC

    Thanks to everybody for the responses. The main reason why i posted this why I posted this wasn't to show a statistically significant difference between the three methods (although there does seem to be one). It was more to inquire as to why that difference exists. Certainly we're talking on the order of milliseconds in most programs, as Meryln pointed out.

    Nevertheless, using the "context" method as a base, the other methods are 20-40% slower, and I'm not familiar enough with PERL to rationalize that in terms of the internals.

    hardburn pointed out that the construction of the @_ variable in the scalar() call is probably a major factor. That seems reasonable, and I could imagine testing it by passing an arrayref to a custom-written version of scalar. But I don't really understand why $#array is so much slower?

    -Tats