Re: Optimizing the bejeezus out of a sub
by dws (Chancellor) on Jun 24, 2003 at 07:15 UTC
|
Any ideas where to look for that last little bit of performance?
The first thing that jumped out when I looked over your code is the loop. It looks like something transcribed from C. Instead of
for(;@_;shift) {
...
... @_[0] ...
try doing something like
foreach ( @_ ) {
...
... $_ ...
This plays to Perl's strengths (and saves you a few array subscripts).
Or, if you want to be explicit,
foreach my $section ( @_ ) {
...
| [reply] [d/l] [select] |
|
I wrote about the C-style for() in Re: Re: Believably slow... It is slower than the foreach() style for. Not significantly but slightly. I could only remove a few operations from your code, I noted each spot with an ALTER comment. I think your time is consumed by those print() statement - IO is always expensive.
## Print the given template sections to the supplied filehandle
sub printto
{
my $self = shift;
my $fh = shift;
my $ret = 0;
my($val,$sec,$v);
# ALTER - foreach over C-style for
foreach my $sec_k (@_)
{
$sec = $self->[_sec]{$sec_k};
# ALTER - drop unnecessary defined()
unless ( $sec )
{
# ALTER - remove indirect object syntax
$fh->print( $self->_nosuchsec($sec_k) );
next;
}
$ret++;
foreach $v (@$sec)
{
if ($v->[_type] == type_text)
{
# ALTER - remove indirect object syntax
$fh->print( $v->[_contents] );
}
else
{
$val = $self->[_assign]{$v->[_contents]};
$fh->print(
defined $val
? $val
: $self->_nosuchvar($val,"\$".$val) );
}
}
}
return $ret;
}
| [reply] [d/l] |
|
Thanks for the suggestions! Here's what I found...
I already changed the loops to use foreach; that seemed to be a little faster, and a lot more readable, so all benchmarks already have that in it. Here's what the numbers looked like before the other two changes:
Total Elapsed Time = 43.18577 Seconds
User+System Time = 42.69577 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto
Changing if(!defined($sec)) to unless($sec) seemed to slow printto down just a hair. This could just be measurement error; for some reason apparently other parts of the code sped up with this change, which again points to measurement error...
Total Elapsed Time = 41.62356 Seconds
User+System Time = 41.09356 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
51.3 21.09 20.905 130000 0.0002 0.0002 FTS::printto
Changing print $fh to $fh->print seemed to slow things down significantly. The IO::Handle::print changes who the print calls get charged to, so the ExclSec for printto goes down, but it's made up for by all the calls to IO::Handle::print, and the CumulS and overall time go way up...
Total Elapsed Time = 51.91479 Seconds
User+System Time = 51.32479 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
41.5 21.31 18.510 281000 0.0000 0.0000 IO::Handle::print
38.9 19.97 35.589 130000 0.0002 0.0003 FTS::printto
| [reply] [d/l] [select] |
|
|
|
|
| [reply] |
|
foreach my $section ( @_ ) { ... } doesn't copy value from @_ to $section. In the loop the variable $section is merely an alias to a currently selected element in @_ array. You can even modify @_ array by assigning a new value to $section. Example:
# adds 1 to all elements of array
foreach my $elem (@arr) {
$elem ++;
}
--
Ilya Martynov, ilya@iponweb.net
CTO IPonWEB (UK) Ltd
Quality Perl Programming and Unix Support
UK managed @ offshore prices - http://www.iponweb.net
Personal website - http://martynov.org
| [reply] [d/l] [select] |
Re: Optimizing the bejeezus out of a sub
by hossman (Prior) on Jun 24, 2003 at 07:35 UTC
|
dprof says this sub is running 50% of the time for my application, so it looks like my best candidate for optimization.
Well, your application has to spent 50% of it's time somewhere.
If you've got 50 methods, and you're making 50 calls to each, and only one of those methods is 50% of your execution time, then you've got something to look into. But if you program consists of nothing more then some initialization, some config file parsing, a quick select from a DB, and then a single call to this method ... 50% is kicking some ass.
Numbers are meaningless without units, and Percentages are meaningless without context.
| [reply] |
|
My point is that if I'm going to optimize anything, this is the sub to spend my time on. The other two candidates I've already tweaked, and even if I doubled their performance I'd still get only a modest gain. A clever insight on this sub could make a much bigger difference.
%Time ExclSec CumulS #Calls sec/call Csec/c Name
48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto
17.5 7.499 7.490 10000 0.0007 0.0007 FTS::addstring
15.3 6.569 6.450 120000 0.0001 0.0001 FTS::assign
6.77 2.892 42.220 10000 0.0003 0.0042 main::__ANON__
2.55 1.090 43.310 10002 0.0001 0.0043 Benchmark::__ANON__
2.44 1.040 1.030 10000 0.0001 0.0001 main::cleverlinks
2.20 0.940 8.410 10000 0.0001 0.0008 FTS::parsehandle
1.45 0.620 0.610 10000 0.0001 0.0001 IO::File::open
1.38 0.590 1.650 10000 0.0001 0.0002 IO::File::new
1.38 0.590 10.620 10000 0.0001 0.0011 FTS::parsefile
1.03 0.440 11.410 10000 0.0000 0.0011 FTS::new_fromfile
0.94 0.400 0.390 10000 0.0000 0.0000 FTS::new
0.77 0.330 0.320 10000 0.0000 0.0000 Symbol::gensym
0.42 0.180 0.480 10000 0.0000 0.0000 IO::Handle::new
0.07 0.030 0.010 20000 0.0000 0.0000 IO::Handle::DESTROY
| [reply] |
|
Here's another thought: how about not calling your printto() and assign() methods quite so often?
There's overhead in calling the subs in the first place. Doubly so for method calls. Cut down on the number of calls.
Examine your algorithms to determine why it's necessary to call FTS::printto()/assign() an order of magnitude more times than anything else in your program. Can the work be put off, done in batches? Can you leave yourself helpful references in the data so that less looping/searching is necessary? (Fewer arrays, more hashes). Is there an opportunity to memoize functions somewhere? Can you prepare the data better during FTS::parsefile (or new_fromfile or something) so that it's more easily dealt with later on?
Beware of premature optimization
| [reply] |
Re: Optimizing the bejeezus out of a sub
by ant9000 (Monk) on Jun 24, 2003 at 07:16 UTC
|
It's already barebone as it is, as far as I can tell. The only idea that comes to my mind is to delay print
calls by building up a message string, and print it just once at the sub end. Depending on the amount of memory available (and output expected) it might get you a tiny little bit of speed increase, if your $fh is opened on disk. | [reply] [d/l] [select] |
|
Thanks, that did speed it up a little bit!
Before:
%Time ExclSec CumulS #Calls sec/call Csec/c Name
49.6 21.58 21.525 130000 0.0002 0.0002 FTS::printto
After:
%Time ExclSec CumulS #Calls sec/call Csec/c Name
48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto
| [reply] |
|
Does anybody know why this seems to make a difference? The print calls are of fairly short strings, and the filehandle I'm printing to is buffered (strace shows that each iteration is generating only one system call). I thought that printing a small string to a buffered filehandle would just append the string to the stdio buffer, which would be as fast as or faster than me buffering myself.
| [reply] |
|
|
|
Re: Optimizing the bejeezus out of a sub
by BrowserUk (Patriarch) on Jun 24, 2003 at 13:10 UTC
|
The only thing left inside the routine that might be trimable is the inner foreach loop. If the list of aliases generated by @$sec is of any size, then you maybe paying a time penalty for building that list, and might realise some saving by using indexes to iterate over the array
for (0..$#{$sec}) { $sec->[$_][_type]... }
Also, if _type, _sec & _contents aren't already defined as constants (which are usually UPPERCASE by convention) then making them constants may yield another minor improvement.
However, given that each call to the routine is only taking an average of 0.000165 seconds, there isn't exactly a lot of fat to be trimed within the routine. The fact that you are spending 50% of your programs time, a little over 20 seconds in there comes down to the fact that you are calling it 130,000 times.
There are a few possibilities that might improve things.
If there is no possibility of reducing the 130,000 calls to the routine, then you might save some by not calling the routine! That is to say, if you inlined the sub at the point(s) of call, you would save the cost of building the stackframe and calling it.
Whether this is
- possible, given the architecture of the application.
- desirable, given the probable redundancy that inlining might create and all the caveats that go with it
Are both questions you must decide given your requirements. How much benefit is realisable will depend on the nature of the code that calls this routine.
As an attempt to give some indication, the following (unrepresentative and possibly misrepresentative) benchmark shows that under some circumstances, inlining and/or deferring the printing, can yield some possibly significant benefits.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
|
|
|
Re: Optimizing the bejeezus out of a sub
by xdg (Monsignor) on Jun 24, 2003 at 14:38 UTC
|
The one thing that jumps out at me quickly is that the object-structure has you doing a lot of dereferencing, array lookups, and hash lookups to get to your actual content. You might consider if there are places you can benefit from creating temporary references deeper into your strucure before loop and avoid dereferencing through $self->[xxxx] each time through.
xdg
| [reply] [d/l] |
Re: Optimizing the bejeezus out of a sub
by Elian (Parson) on Jun 24, 2003 at 18:59 UTC
|
I dunno--you're clocking in at 200 microseconds an invocation. That's really not bad...
It's not going to make a huge difference, but you might want to make sure that autoflush is off for the filehandle in $fh. It won't save much, but everything counts I suppose. | [reply] |
Re: Optimizing the bejeezus out of a sub
by rdfield (Priest) on Jun 24, 2003 at 12:46 UTC
|
I remember BrowserUk doing some performance work on parameter passing, and the conclusion (IIRC) was to refer to the parameter array directly, rather than shifting.rdfield | [reply] |
Re: Optimizing the bejeezus out of a sub
by tall_man (Parson) on Jun 24, 2003 at 15:42 UTC
|
Is this really what you want to do? If $val is defined, print it. Otherwise, pass undef and a string concatenated with undef to a subroutine. I suspect your code has never branched to that case, since it is almost certainly a bug.
print $fh defined($val) ? $val
: $self->_nosuchvar($val,"\$".$val);
| [reply] [d/l] |
|
print $fh defined($val) ? $val
: $self->_nosuchvar($v->[_contents],
"\$".$v->[_contents]);
Thanks!
| [reply] [d/l] |
Re: Optimizing the bejeezus out of a sub
by tilly (Archbishop) on Jun 24, 2003 at 17:54 UTC
|
A number of comments, most of which shouldn't affect performance, but a couple of which might.
- Why are you predeclaring $val, $sec, and $v? Move the declaration inline. I doubt that there is any performance difference but it keeps someone (like me) from wondering whether, for instance, $v is global.
- Try assigning @_ to $ret, then decrement $ret on skipped sections. Theoretically faster, though probably imperceptibly so.
- You could select $fh temporarily rather than printing to it each time.
- Why are you writing your own templating engine? I would be inclined to use one of the many ones out there already, several of which have been optimized far more than I would feel like doing it.
| [reply] |
|
1: I did this to see if it helped performance. It didn't, but I didn't change it back before posting.
2-3: I'll try these and see if they do anything interesting.
4: Yeah, I know. I've got a seriously hacked up copy of CGI::FastTemplate that I've been using for years, so it has a style that I've gotten used to and, more importantly, that the Web page designers I work with have gotten used to. It's also a PITA to maintain, so I rewrote it to be more legible (and I hoped faster, but it turns out not). I'm trying to decide if I should keep my hacked up copy of CGI::FastTemplate, use my rewritten replacement, or just use another engine. That got me started benchmarking different templating engines, which led to optimizing this sub. Maybe when I've done a little more research I'll Seek some Wisdom on the merits of the different templating engines...
| [reply] |
|
%Time ExclSec CumulS #Calls sec/call Csec/c Name
48.8 20.83 20.710 130000 0.0002 0.0002 FTS::printto
After 2
%Time ExclSec CumulS #Calls sec/call Csec/c Name
51.2 23.74 23.620 130000 0.0002 0.0002 FTS::printto
3: Ditto.
After 3
%Time ExclSec CumulS #Calls sec/call Csec/c Name
49.5 21.00 20.880 130000 0.0002 0.0002 FTS::printto
| [reply] |
Re: Optimizing the bejeezus out of a sub
by particle (Vicar) on Jun 24, 2003 at 12:55 UTC
|
once you're done with the suggestions above, don't forget to add sub bejeezus () {} to get that bejeezus right out of there!
~Particle *accelerates*
| [reply] [d/l] [select] |