ribasushi has asked for the wisdom of the Perl Monks concerning the following question:

Greetings honorable Monks,

I am doing some massive data processing, and in my quest for additional speed I wrote the following to check how well does perl actually perform:
use Benchmark qw(:all); my $a = 10; my $b = 20; cmpthese(-2, { 'Direct' => sub { my $c = $a+$b; }, 'Sub' => sub { my $c = add ($a, $b); }, 'Method' => sub { my $c = __PACKAGE__->meth_add ($a, $b); }, 'Method+sub' => sub { my $c = __PACKAGE__->meth_sub_add ($a, $b); }, 'Sub^2' => sub { my $c = add2 ($a, $b); }, 'Method^2' => sub { my $c = __PACKAGE__->meth2 ($a, $b); }, }); sub add { return $_[0] + $_[1]; } sub add2 { return add (@_); } sub meth_add { shift; return $_[0] + $_[1]; } sub meth_sub_add { shift; return add (@_); } sub meth2 { return $_[0] -> meth_add ( $_[1], $_[2] ); }
And here are the results I get:
                Rate Method^2 Method+sub     Sub^2    Method       Sub    Direct
Method^2    500713/s       --       -22%      -37%      -44%      -58%      -91%
Method+sub  643967/s      29%         --      -19%      -27%      -46%      -88%
Sub^2       797484/s      59%        24%        --      -10%      -33%      -85%
Method      886331/s      77%        38%       11%        --      -26%      -84%
Sub        1191561/s     138%        85%       49%       34%        --      -78%
Direct     5397078/s     978%       738%      577%      509%      353%        --
Now I understand how method dispatch works and why it is slow, but I do not understand why subroutines suffer almost the same penalty. Is there some kind of perl compile flag, or some hidden optimization that can improve these numbers (which _really_ add up in a complex code flow)?

Thanks

Peter

Replies are listed 'Best First'.
Re: Performance issues with subroutines/methods
by shmem (Chancellor) on Jun 17, 2007 at 19:56 UTC
    With your benchmark, you are counting the overhead of calling subs in various ways. If you access a memory CELL directly (in machine language), that will be significantly faster than calling a function which computes the location of this CELL, and then accesses it. Same here. Subroutines have essentially the same performance penalty like method calls, since method calls are just subs.

    Calling subroutines means, in perl, not only that some code segment in memory is jumped to, but the build up of copious data structures by which the perl engine can keep track of context, and the tear-down of those stackframes after exiting. There is some optimization going on under the hood, as with subs declared with the NULL prototype returning a scalar and not relying on dynamic data or calling other subs, in which case their result will be inlined. But apart from that, calling subs just involves the overhead of, precisely - calling subs :-)

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Performance issues with subroutines/methods
by diotalevi (Canon) on Jun 17, 2007 at 21:18 UTC

    Something else to note - you've got the overhead of modifying @_ with those shift() calls. @_ at first is only a view to the stack. If you modify it, perl has to go create the array and make it real. Drop the shifts() and see if that does anything for you.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      I didn't think shift reifies @_. But getting rid of the shift statement & op helps anyway.

        I guess I really don't know which modifications to @_ trigger reification. I though any change would and Gryphon said something to that effect once. I didn't investigate it further.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Performance issues with subroutines/methods
by ysth (Canon) on Jun 18, 2007 at 02:38 UTC
    As shmem says, subroutine/method calls are pretty expensive. You can shave some off by removing the shift, as diotalevi suggests. Removing the return shaves off some more:
    use Benchmark qw(:all); my $a = 10; my $b = 20; cmpthese(-2, { 'Sub' => sub { my $c = add ($a, $b); }, 'Method' => sub { my $c = __PACKAGE__->meth_add ($a, $b); }, 'Sub^2' => sub { my $c = add2 ($a, $b); }, 'Method^2' => sub { my $c = __PACKAGE__->meth2 ($a, $b); }, 'Method^3' => sub { my $c = __PACKAGE__->meth3 ($a, $b); }, }); sub add { return $_[0] + $_[1]; } sub add2 { $_[0] + $_[1] } sub meth_add { shift; return $_[0] + $_[1]; } sub meth2 { shift; $_[0] + $_[1] } sub meth3 { $_[1] + $_[2] }
    Rate Method Method^2 Method^3 Sub Sub^2 Method 892013/s -- -5% -12% -23% -30% Method^2 936229/s 5% -- -7% -20% -27% Method^3 1008242/s 13% 8% -- -13% -21% Sub 1165141/s 31% 24% 16% -- -9% Sub^2 1275428/s 43% 36% 27% 9% --
Re: Performance issues with subroutines/methods
by BrowserUk (Patriarch) on Jun 18, 2007 at 14:54 UTC

    As others have identified, there is a fairly high, fixed overhead for calling subroutines (and therefore methods) in Perl (or any dynamic language). Beyond moving to a compiled language, there are only two things you can do.

    1. Avoid subroutine calls.
      • Inline them.
      • Memoize them.
    2. Avoid short subroutine calls.

      The overhead is fixed. It's effect on performance is most pronounced when the body of the subroutine does very little. Adding a little 'do nothing, but expensively' code to your benchmark:

      sub add { ## Other subs modified in the same way. my $temp = $_[0] ** int( 1/ $_[ 1 ] ); return $_[0] + $_[1]; }

      Results:

      ## original C:\test>junk Rate Method^2 Method Method^3 Sub Sub^2 Method^2 720070/s -- -4% -12% -28% -32% Method 747754/s 4% -- -9% -25% -30% Method^3 820126/s 14% 10% -- -18% -23% Sub 998208/s 39% 33% 22% -- -6% Sub^2 1063462/s 48% 42% 30% 7% -- ## Modified C:\test>junk Rate Method Method^3 Method^2 Sub Sub^2 Method 488925/s -- -2% -2% -14% -23% Method^3 497153/s 2% -- -1% -13% -22% Method^2 500096/s 2% 1% -- -12% -21% Sub 569519/s 16% 15% 14% -- -10% Sub^2 635516/s 30% 28% 27% 12% --

      Once the subs do more, the apparent cost of the calls is reduced.

      Of course, that flies in the face of OO doctrine that suggests that methods should be short.

      The costs are especially high if you choose to avoid direct access to your instance data and wrap all accesses in getter/setter subs.

      You can further exacerbate the costs, by using one of the many 'safe' object implementations that add another level or two of indirection to the object and method resolution chains. All the 'inside out object' implementations fall into this category.

      If you use one of the canned parameter validation mechanisms that use tied variables as a part of their implementation, you compound the costs again.

      If you operate in an environment that is IO bound, or where performance is not an issue, these techniques are very effective and useful. But in a cpu-bound, performance constrained environment, you have to take the decision as to which is the higher priority.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      You can further exacerbate the costs, by using one of the many 'safe' object implementations that add another level or two of indirection to the object and method resolution chains. All the 'inside out object' implementations fall into this category.
      I don't think the latter claim is quite correct. In fact, it appears that the handling of Object::InsideOut objects can be faster than legacy blessed hash handling. Cheers.

        Array-based objects are faster than hash-based objects. You don't need to use a module to get array-based objects.

        I see the POD claims that its hashed-based objects are still faster than conventional hash-based objects. I'd like to see the benchmark.

        Given the amount of work this module is doing behind the covers, I would need to see the benchmark.

        And shareable between threads, that's a really clever trick.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Performance issues with subroutines/methods
by jbert (Priest) on Jun 18, 2007 at 13:39 UTC
    Note that tweaking the subroutine calling will only help if your application is CPU bound and you are making lots of calls to short subroutines. If you're doing heavy data processing, your app might be I/O bound or memory-starved.

    If you are being affected by subroutine calling speed, it's often the case that you have a single subroutine which is being called many times. You should be able to see this in the profiler.

    If you only have a few call sites (places where you call this function) you can simply remove the function call overhead by manually inlining the function in those places (with appropriate comments). Of course, this isn't good coding style in general, but optimising for speed generally involves de-optimising for maintainability.