Performance Question

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Performance Question by Rhandom (Curate) on Feb 28, 2007 at 16:03 UTC
The first gut reaction is to say - use a template system such as Template::Toolkit, CGI::Ex::Template, or HTML::Template. However, that sometimes is overkill and it seems that it would be in your case. The big thing that would help in your case is to simplify your system into something similar to the following: `my %swap = ( name_1 => $name_1, name_2 => $name_2, name_3 => $name_3, name_4 => $name_4, name_5 => $name_5, name_6 => $name_6, ); WHILE (<MYTEMPLATE>) { s{<\?--(name_\d+)-->}{ defined($swap{$1}) ? $swap{$1} : '' }eg; }` [download] The key thing that makes this update faster is that you are limiting yourself to a single regex pass through. Doing multiple regex passes is going to give you a speed hit. my @a=qw(random brilliant braindead); print $a[rand(@a)];	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Performance Question by kyle (Abbot) on Feb 28, 2007 at 16:02 UTC
This is going to leave any `<?--name_x-->` item in your template unchanged if you don't have a defined value for it. Is that what you want? You might look at using something like HTML::Template or Template Toolkit 2.	[reply] [d/l]
Re^2: Performance Question by Anonymous Monk on Feb 28, 2007 at 16:09 UTC
Yes, that's what I want, and definitely not any warning messages saying things like "Use of uninitialized value..."	[reply]
Re^3: Performance Question by Rhandom (Curate) on Feb 28, 2007 at 17:22 UTC
If you want to only swap in values that exist then use the following: `s{(<\?--(\w+)-->)}{ defined($swap{$2}) ? $swap{$2} : $1 }eg;` [download] That will only replace values that match have a corresponding entry in %swap. If you are brave you can use something like this: `s/<\?--(\w+)-->(??{ defined($swap{$^N}) ? '(?=)' : '(?!)' })/$swap{$1} +/g;` [download] It also does what you are looking for - but without the e modifier on the swap. You will probably need to read perlre to follow it fully - but essentially you are using the (??{}) block to check your regex as you go and then using ?= and ?! to report success or failure. my @a=qw(random brilliant braindead); print $a[rand(@a)];	[reply] [d/l] [select]
Re: Performance Question by davorg (Chancellor) on Feb 28, 2007 at 16:59 UTC
If you have a large number of variables that you want to access in the same way, then it makes sense to store them in some kind of aggregate variable. In this case I'd go for a hash. `use CGI ':cgi'; # list of parameter names my @params = qw(city zip lastname country other obs); # extract all parameters into a hash my %param; foreach (@params) { $param{$_} = param($_); } # time passes... while (<MYTEMPLATE>) { foreach my $p (@params) { s/<\?--$p-->/$param{$p}/g if defined $param{$p}; # you'll probably want to print it too } }` [download] -- <http://dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply] [d/l]
Re: Performance Question by agianni (Hermit) on Feb 28, 2007 at 16:03 UTC
My understanding is that this won't make a difference. You won't get warnings if $name_x isn't defined because, despite the fact that the if statement comes after the substitution statement, it's executed first. I don't think there will be a significant enough performance difference (someone else might be able to answer this from an internal perspective?), but you could Benchmark it.	[reply]
Re: Performance Question by agianni (Hermit) on Feb 28, 2007 at 17:34 UTC
Also, if you want to talk about performance, the way in which you're looking for variables to change is going to be much more of a problem. Since you're going through each field on each line, this is roughly O(n^2) in complexity. That could get really expensive if you're running those regexes on a large number of fields/lines. Rather than checking for all of the fields on each iteration of the while loop, you might do something like this: `while(<MYTEMPLATE>){ # get the names of the fields on this line my @variables = $_ =~ m/<\?--(\S+?)-->/g; for my $variable ( @variables ){ # replace the tag with the value of the variable # if there is one. s/<\?--$variable-->/$$variable/g if $$variable; } }` [download] That should yield something more like O(n), which should be noticeably faster in most cases (remember that O(n^2) smaller than O(n) for small numbers), less error prone and since it's dynamic, you won't have to update it when there's a new field to check.	[reply] [d/l]
Re^2: Performance Question by Rhandom (Curate) on Feb 28, 2007 at 17:46 UTC
++ to you for more tries at solutions. The thing that isn't stated by the OP is how large the templates are. If the string is relatively short then doing the multiple regex passes is fine. The longer the template becomes - if he wants it to be fast - he really needs to try and get the number of passes as low as he can. The final question is - is this swapping bit the bottle neck of the process. If the OP is running these as CGI processes - then it probably doesn't matter what algorithm he is using because the startup time of the CGI is going to be costly -- that is unless the templates being used are mega bytes in size. my @a=qw(random brilliant braindead); print $a[rand(@a)];	[reply]
Re^3: Performance Question by agianni (Hermit) on Feb 28, 2007 at 17:56 UTC
Right, I actually just updated my post (not sure if you read it before or after) but the point is that this alternate algorithm won't save you much with a small number of variables and a small file and could actually cost you more for very small numbers, like any sized file with no variables defined.	[reply]
Re^4: Performance Question by blazar (Canon) on Mar 01, 2007 at 10:34 UTC
Re^2: Performance Question by blazar (Canon) on Mar 01, 2007 at 10:24 UTC
`my @variables = $_ =~ m/<\?--(\S+?)-->/g;` [download] One really minor nitpick: `$_` is the topicalizer and as such it is implicit in many operations, wich makes for terse syntax. So one either wants a fully qualified variable name like `$line` and thus `my @variables = $line =~ /<\?--(\S+?)-->/g;` [download] or `my @variables = /<\?--(\S+?)-->/g;` [download] `for my $variable ( @variables ){ # replace the tag with the value of the variable # if there is one. s/<\?--$variable-->/$$variable/g if $$variable; }` [download] One major nitpick: I would've ++'ed your node for mentioning another WTDI, complete of interesting considerations, but I --'ed it since you're unnecessarily using symrefs whereas we warn newbies all the time about the risk of doing so. It just amounts to using the symbol table as a generic hash, which is a good reason to use a specific one instead, e.g. %value. Failing to do so you should be prepared to cope with some corner cases: what if he as a `<?--0-->` or `<?--$-->` tag? Well, unprobable enough, granted, but even without that putting all values together in a separate structure is saner. The OP is clearly a newbie, so he may read your example as advocating symrefs and take them as a programming habit, that first or later may bite him in the neck. Also, I would throw a defined in there for one may well want to replace say `<?--cost-->` with `0`.	[reply] [d/l] [select]
Re: Performance Question by wfsp (Abbot) on Feb 28, 2007 at 18:28 UTC
I think there's some heavy weather being made of this. So here's my go. :-) There's not a regex or any html to be seen. Neither is there any 'hard coded' field names. All the data is taken straight from the CGI.pm `param` method. This is, imho, the most effiecient way to go. :-) #!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; use CGI; use HTML::Template; # create query object with some params # (straight from the docs) :-) my $q = new CGI( { 'dinosaur' => 'barney', 'song' => 'I love you', 'friend' => 'george', } ); # create data structure for HTML::Template my @param_loop = map { {name => $_, value =>$q->param($_), } } $q->param; # get the template from DATA (this would normally be in a file) my @tmpl = <DATA>; # blast off! my $t = HTML::Template->new(arrayref => \@tmpl); $t->param(param_loop => \@param_loop); print $t->output; __DATA__ <html><head><title>template</title></head> <body> <TMPL_LOOP NAME=param_loop> <p><TMPL_VAR NAME=name> -> <TMPL_VAR NAME=value></p> </TMPL_LOOP> </body> </html> [download] output: `<html><head><title>template</title></head> <body> <p>friend -> george</p> <p>song -> I love you</p> <p>dinosaur -> barney</p> </body>` [download] update: From the docs: If a value is not given in the query string, as in the queries "name1=&name2=" or "name1&name2", it will be returned as an empty string. So you won't get any pesky undefs either.	[reply] [d/l] [select]