Optimizing the bejeezus out of a sub

sgifford has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Optimizing the bejeezus out of a sub by dws (Chancellor) on Jun 24, 2003 at 07:15 UTC
Any ideas where to look for that last little bit of performance? The first thing that jumped out when I looked over your code is the loop. It looks like something transcribed from C. Instead of `for(;@_;shift) { ... ... @_[0] ...` [download] try doing something like `foreach ( @_ ) { ... ... $_ ...` [download] This plays to Perl's strengths (and saves you a few array subscripts). Or, if you want to be explicit, `foreach my $section ( @_ ) { ...` [download]	[reply] [d/l] [select]
Re: Re: Optimizing the bejeezus out of a sub by diotalevi (Canon) on Jun 24, 2003 at 11:52 UTC
I wrote about the C-style for() in Re: Re: Believably slow... It is slower than the foreach() style for. Not significantly but slightly. I could only remove a few operations from your code, I noted each spot with an ALTER comment. I think your time is consumed by those print() statement - IO is always expensive. ## Print the given template sections to the supplied filehandle sub printto { my $self = shift; my $fh = shift; my $ret = 0; my($val,$sec,$v); # ALTER - foreach over C-style for foreach my $sec_k (@_) { $sec = $self->[_sec]{$sec_k}; # ALTER - drop unnecessary defined() unless ( $sec ) { # ALTER - remove indirect object syntax $fh->print( $self->_nosuchsec($sec_k) ); next; } $ret++; foreach $v (@$sec) { if ($v->[_type] == type_text) { # ALTER - remove indirect object syntax $fh->print( $v->[_contents] ); } else { $val = $self->[_assign]{$v->[_contents]}; $fh->print( defined $val ? $val : $self->_nosuchvar($val,"\$".$val) ); } } } return $ret; } [download]	[reply] [d/l]
Re: Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 16:23 UTC
Thanks for the suggestions! Here's what I found... I already changed the loops to use `foreach`; that seemed to be a little faster, and a lot more readable, so all benchmarks already have that in it. Here's what the numbers looked like before the other two changes: Total Elapsed Time = 43.18577 Seconds User+System Time = 42.69577 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto Changing `if(!defined($sec))` to `unless($sec)` seemed to slow `printto` down just a hair. This could just be measurement error; for some reason apparently other parts of the code sped up with this change, which again points to measurement error... Total Elapsed Time = 41.62356 Seconds User+System Time = 41.09356 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 51.3 21.09 20.905 130000 0.0002 0.0002 FTS::printto Changing `print $fh` to `$fh->print` seemed to slow things down significantly. The `IO::Handle::print` changes who the print calls get charged to, so the `ExclSec` for `printto` goes down, but it's made up for by all the calls to `IO::Handle::print`, and the `CumulS` and overall time go way up... Total Elapsed Time = 51.91479 Seconds User+System Time = 51.32479 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 41.5 21.31 18.510 281000 0.0000 0.0000 IO::Handle::print 38.9 19.97 35.589 130000 0.0002 0.0003 FTS::printto	[reply] [d/l] [select]
Re: Re: Re: Re: Optimizing the bejeezus out of a sub by diotalevi (Canon) on Jun 24, 2003 at 16:35 UTC
Re: Re: Re: Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 16:47 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 07:38 UTC
My goal was to speed things up by avoiding copying each value from `@_` to `$section`. While `foreach` is certainly more aesthetic, some benchmarking shows it to be about the same speed as my loop. I'm a little surprised by this, actually; I expected using `@_` directly to always be faster, since it would avoid copying values.	[reply]
Re: Re: Re: Optimizing the bejeezus out of a sub by IlyaM (Parson) on Jun 24, 2003 at 08:55 UTC
`foreach my $section ( @_ ) { ... }` doesn't copy value from @_ to $section. In the loop the variable $section is merely an alias to a currently selected element in @_ array. You can even modify @_ array by assigning a new value to $section. Example: `# adds 1 to all elements of array foreach my $elem (@arr) { $elem ++; }` [download] -- Ilya Martynov, ilya@iponweb.net CTO IPonWEB (UK) Ltd Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net Personal website - http://martynov.org	[reply] [d/l] [select]
Re: Optimizing the bejeezus out of a sub by hossman (Prior) on Jun 24, 2003 at 07:35 UTC
dprof says this sub is running 50% of the time for my application, so it looks like my best candidate for optimization. Well, your application has to spent 50% of it's time somewhere. If you've got 50 methods, and you're making 50 calls to each, and only one of those methods is 50% of your execution time, then you've got something to look into. But if you program consists of nothing more then some initialization, some config file parsing, a quick select from a DB, and then a single call to this method ... 50% is kicking some ass. Numbers are meaningless without units, and Percentages are meaningless without context.	[reply]
Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 09:16 UTC
My point is that if I'm going to optimize anything, this is the sub to spend my time on. The other two candidates I've already tweaked, and even if I doubled their performance I'd still get only a modest gain. A clever insight on this sub could make a much bigger difference. %Time ExclSec CumulS #Calls sec/call Csec/c Name 48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto 17.5 7.499 7.490 10000 0.0007 0.0007 FTS::addstring 15.3 6.569 6.450 120000 0.0001 0.0001 FTS::assign 6.77 2.892 42.220 10000 0.0003 0.0042 main::__ANON__ 2.55 1.090 43.310 10002 0.0001 0.0043 Benchmark::__ANON__ 2.44 1.040 1.030 10000 0.0001 0.0001 main::cleverlinks 2.20 0.940 8.410 10000 0.0001 0.0008 FTS::parsehandle 1.45 0.620 0.610 10000 0.0001 0.0001 IO::File::open 1.38 0.590 1.650 10000 0.0001 0.0002 IO::File::new 1.38 0.590 10.620 10000 0.0001 0.0011 FTS::parsefile 1.03 0.440 11.410 10000 0.0000 0.0011 FTS::new_fromfile 0.94 0.400 0.390 10000 0.0000 0.0000 FTS::new 0.77 0.330 0.320 10000 0.0000 0.0000 Symbol::gensym 0.42 0.180 0.480 10000 0.0000 0.0000 IO::Handle::new 0.07 0.030 0.010 20000 0.0000 0.0000 IO::Handle::DESTROY	[reply]
Re: Re: Re: Optimizing the bejeezus out of a sub by clintp (Curate) on Jun 24, 2003 at 13:16 UTC
Here's another thought: how about not calling your printto() and assign() methods quite so often? There's overhead in calling the subs in the first place. Doubly so for method calls. Cut down on the number of calls. Examine your algorithms to determine why it's necessary to call FTS::printto()/assign() an order of magnitude more times than anything else in your program. Can the work be put off, done in batches? Can you leave yourself helpful references in the data so that less looping/searching is necessary? (Fewer arrays, more hashes). Is there an opportunity to memoize functions somewhere? Can you prepare the data better during FTS::parsefile (or new_fromfile or something) so that it's more easily dealt with later on? Beware of premature optimization	[reply]
Re: Optimizing the bejeezus out of a sub by ant9000 (Monk) on Jun 24, 2003 at 07:16 UTC
It's already barebone as it is, as far as I can tell. The only idea that comes to my mind is to delay `print` calls by building up a message string, and print it just once at the sub end. Depending on the amount of memory available (and output expected) it might get you a tiny little bit of speed increase, if your `$fh` is opened on disk.	[reply] [d/l] [select]
Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 07:45 UTC
Thanks, that did speed it up a little bit! Before: %Time ExclSec CumulS #Calls sec/call Csec/c Name 49.6 21.58 21.525 130000 0.0002 0.0002 FTS::printto After: %Time ExclSec CumulS #Calls sec/call Csec/c Name 48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto	[reply]
Re: Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 18:47 UTC
Does anybody know why this seems to make a difference? The `print` calls are of fairly short strings, and the filehandle I'm printing to is buffered (`strace` shows that each iteration is generating only one system call). I thought that printing a small string to a buffered filehandle would just append the string to the stdio buffer, which would be as fast as or faster than me buffering myself.	[reply]
Re: Re: Re: Re: Optimizing the bejeezus out of a sub by ant9000 (Monk) on Jun 25, 2003 at 08:33 UTC
Re: Re: Re: Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 25, 2003 at 16:50 UTC
Re: Re: Re: Re: Optimizing the bejeezus out of a sub by Elian (Parson) on Jun 24, 2003 at 19:17 UTC
Re: Optimizing the bejeezus out of a sub by BrowserUk (Patriarch) on Jun 24, 2003 at 13:10 UTC
The only thing left inside the routine that might be trimable is the inner foreach loop. If the list of aliases generated by @$sec is of any size, then you maybe paying a time penalty for building that list, and might realise some saving by using indexes to iterate over the array `for (0..$#{$sec}) { $sec->[$_][_type]... }` Also, if `_type, _sec & _contents` aren't already defined as constants (which are usually UPPERCASE by convention) then making them constants may yield another minor improvement. However, given that each call to the routine is only taking an average of 0.000165 seconds, there isn't exactly a lot of fat to be trimed within the routine. The fact that you are spending 50% of your programs time, a little over 20 seconds in there comes down to the fact that you are calling it 130,000 times. There are a few possibilities that might improve things. Currently, even with the improvement of building your output string up within the routing and printing it once at the end of each call, you are still calling the built-in print 130,000 times. You might save a little by accumulating the output from all 130,000 calls in a string using something like `$self->[_output] .= ...` and then calling print once at the end of the (external) loop that is calling this 130,000 times. The other, and more likely candidate for cutting down your overall execution time is to reduce the number of times you call the routine. Whether that is possible depends on the nature of that external loop. If there is no possibility of reducing the 130,000 calls to the routine, then you might save some by not calling the routine! That is to say, if you inlined the sub at the point(s) of call, you would save the cost of building the stackframe and calling it. Whether this is possible, given the architecture of the application. desirable, given the probable redundancy that inlining might create and all the caveats that go with it Are both questions you must decide given your requirements. How much benefit is realisable will depend on the nature of the code that calls this routine. As an attempt to give some indication, the following (unrepresentative and possibly misrepresentative) benchmark shows that under some circumstances, inlining and/or deferring the printing, can yield some possibly significant benefits. Read more... (2 kB) Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply] [d/l] [select]
Re: Re: Optimizing the bejeezus out of a sub by diotalevi (Canon) on Jun 24, 2003 at 13:22 UTC
The listification of @_ was optimized away by perl so foreach accessed the array directly. Using offsets to counteract the not-actually-happening maloptimization is a maloptimization in and of itself.	[reply]
Re: Re: Re: Optimizing the bejeezus out of a sub by BrowserUk (Patriarch) on Jun 24, 2003 at 13:27 UTC
I said might! Can you substantiate that the listification is always optimised away? Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]
Re: Re: Re: Re: Optimizing the bejeezus out of a sub by diotalevi (Canon) on Jun 24, 2003 at 13:32 UTC
Re: Re: Re: Re: Re: Optimizing the bejeezus out of a sub by BrowserUk (Patriarch) on Jun 24, 2003 at 13:42 UTC
Some notes below your chosen depth have not been shown here
Re: Optimizing the bejeezus out of a sub by xdg (Monsignor) on Jun 24, 2003 at 14:38 UTC
The one thing that jumps out at me quickly is that the object-structure has you doing a lot of dereferencing, array lookups, and hash lookups to get to your actual content. You might consider if there are places you can benefit from creating temporary references deeper into your strucure before loop and avoid dereferencing through `$self->[xxxx]` each time through. xdg	[reply] [d/l]
Re: Optimizing the bejeezus out of a sub by Elian (Parson) on Jun 24, 2003 at 18:59 UTC
I dunno--you're clocking in at 200 microseconds an invocation. That's really not bad... It's not going to make a huge difference, but you might want to make sure that autoflush is off for the filehandle in $fh. It won't save much, but everything counts I suppose.	[reply]
Re: Optimizing the bejeezus out of a sub by rdfield (Priest) on Jun 24, 2003 at 12:46 UTC
I remember BrowserUk doing some performance work on parameter passing, and the conclusion (IIRC) was to refer to the parameter array directly, rather than shifting. rdfield	[reply]
Re: Optimizing the bejeezus out of a sub by tall_man (Parson) on Jun 24, 2003 at 15:42 UTC
Is this really what you want to do? If $val is defined, print it. Otherwise, pass undef and a string concatenated with undef to a subroutine. I suspect your code has never branched to that case, since it is almost certainly a bug. `print $fh defined($val) ? $val : $self->_nosuchvar($val,"\$".$val);` [download]	[reply] [d/l]
Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 16:29 UTC
No, you're right; I introduced that error while optimizing. It should be: `print $fh defined($val) ? $val : $self->_nosuchvar($v->[_contents], "\$".$v->[_contents]);` [download] Thanks!	[reply] [d/l]
Re: Optimizing the bejeezus out of a sub by tilly (Archbishop) on Jun 24, 2003 at 17:54 UTC
A number of comments, most of which shouldn't affect performance, but a couple of which might. Why are you predeclaring $val, $sec, and $v? Move the declaration inline. I doubt that there is any performance difference but it keeps someone (like me) from wondering whether, for instance, $v is global. Try assigning @_ to $ret, then decrement $ret on skipped sections. Theoretically faster, though probably imperceptibly so. You could select $fh temporarily rather than printing to it each time. Why are you writing your own templating engine? I would be inclined to use one of the many ones out there already, several of which have been optimized far more than I would feel like doing it.	[reply]
Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 18:31 UTC
1: I did this to see if it helped performance. It didn't, but I didn't change it back before posting. 2-3: I'll try these and see if they do anything interesting. 4: Yeah, I know. I've got a seriously hacked up copy of `CGI::FastTemplate` that I've been using for years, so it has a style that I've gotten used to and, more importantly, that the Web page designers I work with have gotten used to. It's also a PITA to maintain, so I rewrote it to be more legible (and I hoped faster, but it turns out not). I'm trying to decide if I should keep my hacked up copy of `CGI::FastTemplate`, use my rewritten replacement, or just use another engine. That got me started benchmarking different templating engines, which led to optimizing this sub. Maybe when I've done a little more research I'll Seek some Wisdom on the merits of the different templating engines...	[reply]
Re: Re: Optimizing the bejeezus out of a sub by sgifford (Prior) on Jun 24, 2003 at 18:42 UTC
2: Didn't really help (a little slower, but probably just measuring error) Before: %Time ExclSec CumulS #Calls sec/call Csec/c Name 48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto After 2 %Time ExclSec CumulS #Calls sec/call Csec/c Name 51.2 23.74 23.620 130000 0.0002 0.0002 FTS::printto 3: Ditto. After 3 %Time ExclSec CumulS #Calls sec/call Csec/c Name 49.5 21.00 20.880 130000 0.0002 0.0002 FTS::printto	[reply]
Re: Optimizing the bejeezus out of a sub by particle (Vicar) on Jun 24, 2003 at 12:55 UTC
once you're done with the suggestions above, don't forget to add `sub bejeezus () {}` to get that bejeezus right out of there! ~Particle accelerates	[reply] [d/l] [select]