Re^3: Are monks hibernating?

Replies are listed 'Best First'.
Re^4: Are monks hibernating? by BrowserUk (Patriarch) on Feb 17, 2007 at 01:25 UTC
...the first time I've heard that the major bottle neck in Perl is that sub/methods calls are slow It's common knowledge that Perl is significantly slower than straight C, but are you saying that the bulk of that slow down occurs due to sub/method calls being slow? I never said "the major bottleneck". And of course it is slower than straight C. It could not be otherwise. However, here is a partial breakdown of the costs of subroutine and method calls, and various forms of OO: Rate OO_7 OO_8 OO_10 OO_9 OO_6 OO_5 OO_4 OO_3 OO_2 OO_1 sub2 s +ub1 inline OO_7 70.4/s -- -0% -43% -44% -69% -70% -81% -88% -89% -91% -92% - +95% -99% OO_8 70.5/s 0% -- -43% -44% -69% -70% -81% -88% -89% -91% -92% - +95% -99% OO_10 124/s 76% 76% -- -1% -45% -46% -66% -79% -81% -84% -85% - +91% -98% OO_9 126/s 79% 79% 1% -- -44% -46% -66% -79% -81% -84% -85% - +91% -98% OO_6 227/s 222% 222% 83% 80% -- -2% -38% -62% -65% -71% -73% - +84% -96% OO_5 232/s 229% 229% 87% 84% 2% -- -37% -61% -65% -70% -73% - +83% -96% OO_4 367/s 421% 420% 195% 191% 62% 58% -- -39% -44% -53% -57% - +74% -93% OO_3 600/s 752% 752% 383% 376% 165% 159% 64% -- -9% -23% -29% - +57% -88% OO_2 656/s 832% 831% 428% 421% 189% 183% 79% 9% -- -16% -23% - +53% -87% OO_1 782/s1010% 1009% 529% 520% 244% 237% 113% 30% 19% -- -8% - +44% -85% sub2 847/s1103% 1102% 582% 572% 273% 266% 131% 41% 29% 8% -- - +39% -84% sub1 1388/s1871% 1869%1017%1001% 512% 499% 278% 131% 112% 78% 64% + -- -73% inline5179/s7252% 7246%4067%4009%2182%2134%1312% 763% 689% 562% 511% 2 +73% -- [download] Note the big step changes between `inline` code and `sub1`; the code runs nearly 3 times more slowly. The 'body' of the code, a single integer increment is not worth a sub, but it is designed to emphasis the cost of the calls, and OO support code. `sub1` & `sub2`; name the parameters and your code runs 5 times slower. `OO_3` & `OO_4`; use getters and setters and your code is now 13 times slower. `OO_6` & `OO_9`; use Best Practices and your code is 40 times slower. `OO_10` & `OO_8`; admittedly far better than traditional InsideOut objects which result in code that runs 72 times more slowly. Of course this is 'worst case'. If your application is IO-bound, or DB-bound, or only creates a few big objects, with a few large methods and only runs for a few seconds; then none of this is of the slightest concern to you. But if your applications are running cpu-bound over large volumes of data. If you create lots of small objects with small methods and call them lots of times. If your program runs for a few hours when coded inline, or as a few large subroutines. Then if you properly OO-ize it using Class::Std, it could end up running for a day or more for every hour the original ran. Or to look at it another way, if you take an app that uses properly structured OO approach using Class::Std and you bastardize it to use simple subs and inline code blocks. If the original ran for 6 hours, the destructured version might end up taking 10 minutes! The point is that even a few percent improvement in the costs of function calling would benefit every Perl application, and could have profound affect on the performance for some types of applications. Benchmark code Read more... (8 kB) Another way of looking at it is to take a heavily recursive function (Ackermann,) and re-write it to use exactly the same algorithm--repeating all the same motions, stacking and unstacking the same amount of data; building and unwinding the same number of code block frames--but do it without calling a subroutine. Ie. A recursive algorithm coded manually using a loop and a stack. This shows that using the identical 'recursive' algorithm, but avoiding the sub calls, `Ack( 3, 0 )` runs twice as fast. `Ack( 3, 8 )` runs 4 times as fast. And by the time you get to `Ack( 3, 9 )` it's a whopping 12 1/2 times faster. Note: There are no tricks here. No memoizations, or reduced recursion levels. When the truely recursive code stacks a parameter, this also stack a parameter. When it pops a parameter, this pops one. When it builds a block (sub) frame, this builds a block (while) frame. In as far as it is possible to emulate recursion through iteration this does so. It really shouldn't be possible to emulate what Perl does when it calls a subroutine, in unoptimized, Perl code, and beat it quite so emphatically. `c:\test>b-recursive-nosub.pl Ack(3,0): 5 Ack(3,0): 5 1 trial of Recursive Ack 3,0 (227us total) 1 trial of Iterative Ack 3,0 (111us total) c:\test>b-recursive-nosub 8 Ack(3,8): 2045 Ack(3,8): 2045 1 trial of Recursive Ack 3,8 (13.698s total) 1 trial of Iterative Ack 3,8 (3.328s total) c:\test>b-recursive-nosub 9 Ack(3,9): 4093 Ack(3,9): 4093 1 trial of Recursive Ack 3,9 (162.104s total) 1 trial of Iterative Ack 3,9 (13.234s total)` [download] The benchmark Read more... (1295 Bytes) At this point I had hoped to try and produce an annotated assembler level trace of the transitions into and out of a subroutine. I got part way (a couple of hours), through producing that and had finger trouble and managed to throw it all away. Forgive me for not starting over but it is incredibly tedious and requires a lot of concentration, and I'm simply not ready to face that again just yet. What I will say is, based on my inspection (on my Win32/AS 811 combination), of the disassembled C that is run when a Perl level subroutine is called, there appears to be considerable scope for optimisation. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: Are monks hibernating?
by BrowserUk (Patriarch) on Feb 17, 2007 at 01:25 UTC

...the first time I've heard that the major bottle neck in Perl is that sub/methods calls are slow It's common knowledge that Perl is significantly slower than straight C, but are you saying that the bulk of that slow down occurs due to sub/method calls being slow?

I never said "the major bottleneck". And of course it is slower than straight C. It could not be otherwise.

However, here is a partial breakdown of the costs of subroutine and method calls, and various forms of OO:

        Rate OO_7 OO_8 OO_10 OO_9 OO_6 OO_5 OO_4 OO_3 OO_2 OO_1 sub2 s
+ub1 inline
OO_7  70.4/s   --  -0%  -43% -44% -69% -70% -81% -88% -89% -91% -92% -
+95%   -99%
OO_8  70.5/s   0%   --  -43% -44% -69% -70% -81% -88% -89% -91% -92% -
+95%   -99%
OO_10  124/s  76%  76%   --   -1% -45% -46% -66% -79% -81% -84% -85% -
+91%   -98%
OO_9   126/s  79%  79%   1%    -- -44% -46% -66% -79% -81% -84% -85% -
+91%   -98%
OO_6   227/s 222%  222% 83%   80%  --   -2% -38% -62% -65% -71% -73% -
+84%   -96%
OO_5   232/s 229%  229%  87%  84%   2%   -- -37% -61% -65% -70% -73% -
+83%   -96%
OO_4   367/s 421%  420% 195% 191%  62%  58%   -- -39% -44% -53% -57% -
+74%   -93%
OO_3   600/s 752%  752% 383% 376% 165% 159%  64%   --  -9% -23% -29% -
+57%   -88%
OO_2   656/s 832%  831% 428% 421% 189% 183%  79%   9%   -- -16% -23% -
+53%   -87%
OO_1   782/s1010% 1009% 529% 520% 244% 237% 113%  30%  19%   --  -8% -
+44%   -85%
sub2   847/s1103% 1102% 582% 572% 273% 266% 131%  41%  29%   8%   -- -
+39%   -84%
sub1  1388/s1871% 1869%1017%1001% 512% 499% 278% 131% 112%  78%  64%  
+ --   -73%
inline5179/s7252% 7246%4067%4009%2182%2134%1312% 763% 689% 562% 511% 2
+73%     --
[download]

Note the big step changes between

inline code and sub1; the code runs nearly 3 times more slowly.
The 'body' of the code, a single integer increment is not worth a sub, but it is designed to emphasis the cost of the calls, and OO support code.
sub1 & sub2; name the parameters and your code runs 5 times slower.
OO_3 & OO_4; use getters and setters and your code is now 13 times slower.
OO_6 & OO_9; use Best Practices and your code is 40 times slower.
OO_10 & OO_8; admittedly far better than traditional InsideOut objects which result in code that runs 72 times more slowly.

Of course this is 'worst case'. If your application is IO-bound, or DB-bound, or only creates a few big objects, with a few large methods and only runs for a few seconds; then none of this is of the slightest concern to you.

But if your applications are running cpu-bound over large volumes of data. If you create lots of small objects with small methods and call them lots of times. If your program runs for a few hours when coded inline, or as a few large subroutines. Then if you properly OO-ize it using Class::Std, it could end up running for a day or more for every hour the original ran.

Or to look at it another way, if you take an app that uses properly structured OO approach using Class::Std and you bastardize it to use simple subs and inline code blocks. If the original ran for 6 hours, the destructured version might end up taking 10 minutes!

The point is that even a few percent improvement in the costs of function calling would benefit every Perl application, and could have profound affect on the performance for some types of applications.

Benchmark code

Read more... (8 kB)

Another way of looking at it is to take a heavily recursive function (Ackermann,) and re-write it to use exactly the same algorithm--repeating all the same motions, stacking and unstacking the same amount of data; building and unwinding the same number of code block frames--but do it without calling a subroutine. Ie. A recursive algorithm coded manually using a loop and a stack.

This shows that using the identical 'recursive' algorithm, but avoiding the sub calls, Ack( 3, 0 ) runs twice as fast. Ack( 3, 8 ) runs 4 times as fast. And by the time you get to Ack( 3, 9 ) it's a whopping 12 1/2 times faster. Note: There are no tricks here. No memoizations, or reduced recursion levels. When the truely recursive code stacks a parameter, this also stack a parameter. When it pops a parameter, this pops one. When it builds a block (sub) frame, this builds a block (while) frame. In as far as it is possible to emulate recursion through iteration this does so.

It really shouldn't be possible to emulate what Perl does when it calls a subroutine, in unoptimized, Perl code, and beat it quite so emphatically.

c:\test>b-recursive-nosub.pl
Ack(3,0): 5
Ack(3,0): 5
1 trial of Recursive Ack 3,0 (227us total)
1 trial of Iterative Ack 3,0 (111us total)

c:\test>b-recursive-nosub 8
Ack(3,8): 2045
Ack(3,8): 2045
1 trial of Recursive Ack 3,8 (13.698s total)
1 trial of Iterative Ack 3,8 (3.328s total)

c:\test>b-recursive-nosub 9
Ack(3,9): 4093
Ack(3,9): 4093
1 trial of Recursive Ack 3,9 (162.104s total)
1 trial of Iterative Ack 3,9 (13.234s total)
[download]

The benchmark