GSF has asked for the wisdom of the Perl Monks concerning the following question:

I have profiled a program and found that 95% of it's time is spent in this sub.

First the code, then my questions.

sub evalTree { my $intree = shift; my $param = shift; local $_ = $intree->{contents}; # A shortcut for the current node return 0 unless defined; if (/^sub /) { return eval->( evalTree($intree->{left},$param), evalTree($intree->{right},$param) ); } elsif (/^[x]$/) { return $param->{$_}; } else { return $_; } }

Quesitons:
1: when using default variable $_, I assume doing regex match of the beginning is quite silly but I don't know the equivelant of iterating through elements such as....

instead of

if (/^sub /) {

Perhaps something like

if (left($_(0),3) = "sub ")

?

Question 2: I guess thats it. I realize this is a recursion monster but.... Eesh I might end up posting the whole code.

Let me know if you guys want the whole code.

Otherwise your tomatoes, rotten cabbage, and half eaten apples are appreciated. GSF

Replies are listed 'Best First'.
Re: optimization help
by ikegami (Patriarch) on Oct 15, 2007 at 23:00 UTC
    • substr, and eq (lexical comparison) instead of = (assignment).

      if (substr($_, 0, 4) eq "sub ")
    • local $_ = $intree->{contents}; will copy the string. Creating an alias would be better.

      for ($intree->{contents})

      However, I don't really see the use of $_. You're simply obfuscating your code. Perhaps you could alias the expression to something shorter yet meaninful.

      for my $contents ($intree->{contents})
    • As for the question, eval EXPR parses and generates an opcode tree from the supplied source code. That's inherently slow.

      If the subs already exist, you'd be better off passing a reference to the sub rather than the source of the sub.

      if (ref($contents) eq 'CODE') { return $contents->( evalTree($intree->{left}, $param), evalTree($intree->{right}, $param), ); }

      You can get a reference to an existing sub as follows:

      sub func { ... } my $ref = \&func;

      Or you can create a reference to an anonymous sub as follows:

      my $ref = sub { ... };
      Thank you very much. It makes my head hurt reading it but... I will keep cracking at at. I don't feel so directionless anymore :)
Re: optimization help
by samtregar (Abbot) on Oct 15, 2007 at 22:42 UTC
    #1: I really doubt this is your problem. Perl's regex engine contains very smart optimizations for doing this kind of match.

    #2: How big is "the whole code"? I'm curious about what eval() and evalTree() do. When you say this sub has 95% of the runtime is that inclusive of these sub-calls or exclusive? If it's inclusive then your problem could very well actually be due to what happens in them...

    UPDATE: Looking closer, I'm really curious about that eval() call. As it's written it doesn't make sense - Perl won't let you call a builtin with that syntax:

    $ perl -e 'eval->(print "foo\n")' foo Undefined subroutine &main:: called at -e line 1.

    That made me think you must not be using the real Perl eval() builtin. But now I wonder if you made a typo and your code actually is using the eval() builtin. If that's the case then you can stop searching, I know exactly why your code is so slow! String eval() is very slow in Perl - it's actually compiling your code each time it's run!

    UPDATE 2: My mistake, Perl is apparently ok with that syntax for calling string eval():

    $ perl -e 'eval->(qq{print "foo\n"})' foo

    -sam

      It is about this much longer:
      #!/usr/local/bin/perl -w # Based on ideas and code by bjm@paralynx.com # (Brad Murray), posted to perl-ai list June 3 1999. use strict; use Time::HiRes 'time'; require 'dumpvar.pl'; no strict "refs"; my @functions = ( 'sub { # sin(x) my $in = shift; return sin($in); }', 'sub { # add(x,y) my ($left,$right) = @_; return ($left + $right); }', 'sub { # mul(x,y) my ($left,$right) = @_; return ($left * $right); }', 'sub { # log(x) my $in = shift; return ($in == 0) ? 0 : log(abs($in)); }', ); my @terminals = ( \&identity, \&constant ); my $x; my %tree; my $maxdepth = 5; my $pop = 200; my $mates = 10; my $gen = 3; # syntax tree is a set of hashes: # # %node{left} = $nodereference or undef # %node{right} = $nodereference or undef # %node{contents} = $funcref or 'x' or constant # # sin(0.231 * log($x)) # # sin(multiply(0.231,log($x,0)),0) # # undef # sin---multiply----0.231< # | | undef undef # 0 +--------log----------$x< # | undef # 0 # # # get target data set my $inline = <DATA>; my @target = split /,/, $inline; # build a starting population of $pop my @world = map {{organism => new Organism}} 0..$pop; my $count = 0; for (1..$gen) { print "\nGENERATION ", ++$count, "\n"; foreach (@world) { $_->{fitness} = &fitness($_->{organism}{tree},@target); $_->{as_string} = $_->{organism}->as_string; } @world = sort { $a->{fitness} <=> $b->{fitness} or length $a->{as_string} <=> length $b->{as_string} } @world; foreach (@world) { print "=== Fitness: $_->{fitness} ===\n"; print $_->{as_string}, "\n"; } # Kill off some organisms, mate some others to take their spots for (0..($mates-1)) { pop @world; my ($org1, $org2) = ($world[$_]{organism}, $world[rand @world]{organism}); my $child = $org1->mate($org2); unshift @world, {organism => $child}; print "Created '@{[$child->as_string]}' from " . "'@{[$org1->as_string]}' and '@{[$org2->as_string]}'\n"; } # mutate one organism at random my $to_mutate = $world[rand @world]{organism}; print "Mutated '@{[$to_mutate->as_string]}' into "; $to_mutate->mutate; print "'@{[$to_mutate->as_string]}'\n"; } # ------------ subs sub fitness { my ($org, @target) = @_; my $sumdiff = 0; my %p; my $starttime; my $endtime; foreach (0..$#target) { $starttime = time(); $p{'x'} = $_; my $value = &evalTree($org,\%p); $sumdiff += abs($value - $target[$_]); $endtime = time(); $sumdiff += ($endtime - $starttime); } return $sumdiff/(scalar @target); } sub evalTree { my $intree = shift; my $param = shift; local $_ = $intree->{contents}; # A shortcut for the current node return 0 unless defined; if (/^sub /) { return eval->( evalTree($intree->{left},$param), evalTree($intree->{right},$param) ); } elsif (/^[A-Za-z]$/) { return $param->{$_}; } else { return $_; } } # ------------ terminal subs sub identity { return 'x'; } sub constant { return int(rand(20) - 10); } ###################################################### package Organism; use Storable qw(dclone); sub new { my $package = shift; my $self = {tree => $package->_buildTree() }; return bless $self, $package; } sub _buildTree { my $self = shift; my $depth = shift || 0; my %tree; if ($depth > $maxdepth) { $tree{contents} = &{$terminals[rand @terminals]}; } else { $tree{contents} = (int rand(3) ? $functions[rand @functions] : &{$terminals[rand @terminals]} ); } if ($tree{contents} =~ /^sub /) { $tree{left} = $self->_buildTree($depth + 1); $tree{right} = $self->_buildTree($depth + 1); } return \%tree; } sub mate { my ($self, $partner) = @_; my $self_clone = dclone($self); my $partner_clone = dclone($partner); # Get a node from $partner and stick it somewhere in $self_clone my @clone_index = $self_clone->_treeIndex; my @partner_index = $partner_clone->_treeIndex; %{ $clone_index[rand @clone_index] } = %{ $partner_index[rand @partner_index]}; return $self_clone; } sub mutate { my $self = shift; my @index = $self->_treeIndex; %{ $index[rand @index] } = %{ $self->_buildTree($maxdepth-1) }; } sub _treeIndex { # Generates a list of all the nodes in this tree. These are # references to the stuff in the object itself, so changes to the # elements of this list will change the object. my $self = shift; my $tree = shift || $self->{tree}; my @sofar = @_; # Dump the content nodes into a list if ($tree->{contents} =~ /^sub /) { return(@sofar, $tree, $self->_treeIndex($tree->{left}, @sofar), $self->_treeIndex($tree->{right}, @sofar) ); } else { return(@sofar, $tree); } } sub as_string { my $self = shift; my $tree = shift || $self->{tree}; if ($tree->{contents} =~ /^sub \{ \# (\w+)/) { return "$1(" . $self->as_string($tree->{left}) . ',' . $self->as_string($tree->{right}) . ')'; } else { return $tree->{contents}; } } #Total Elapsed Time = 5.147911 Seconds # User+System Time = 4.886911 Seconds #Exclusive Times #%Time ExclSec CumulS #Calls sec/call Csec/c Name # 97.4 4.760 4.973 153323 0.0000 0.0000 main::evalTree # 11.4 0.560 5.580 603 0.0009 0.0093 main::fitness # 4.36 0.213 0.213 71536 0.0000 0.0000 main::__ANON__ # 1.64 0.080 0.080 9733 0.0000 0.0000 Organism::as_string # 0.96 0.047 0.047 20502 0.0000 0.0000 Time::HiRes::time # 0.63 0.031 0.031 7 0.0044 0.0044 IO::File::BEGIN # 0.33 0.016 0.016 6 0.0027 0.0026 ActiveState::Path::BEGIN # 0.33 0.016 0.047 4 0.0040 0.0116 main::BEGIN # 0.31 0.015 0.015 60 0.0002 0.0002 Storable::dclone # 0.31 0.015 0.015 843 0.0000 0.0000 main::constant # 0.29 0.014 0.014 858 0.0000 0.0000 main::identity # 0.08 0.004 0.032 3198 0.0000 0.0000 Organism::_buildTree # 0.00 0.000 0.000 1 0.0000 0.0000 Config::launcher # 0.00 0.000 0.000 1 0.0000 0.0000 Config::fetch_string # 0.00 0.000 0.000 1 0.0000 0.0000 Exporter::Heavy::heavy_e +xport_to_l # evel __END__ -1407,-931,-577,-327,-163,-67,-21,-7,-7,-3,23,89,213,413,707,1113,16 +49

        Applying what I said in Re: optimization help,

        my @functions = ( sub { # sin(x) my $in = shift; return sin($in); }, sub { # add(x,y) my ($left,$right) = @_; return ($left + $right); }, sub { # mul(x,y) my ($left,$right) = @_; return ($left * $right); }, sub { # log(x) my $in = shift; return ($in == 0) ? 0 : log(abs($in)); }, ); ... sub evalTree { my $intree = shift; my $param = shift; # A shortcut for the current node. for my $contents ($intree->{contents}) { return 0 unless defined $contents; if (ref($contents) eq 'CODE') { return $contents->( evalTree($intree->{left}, $param), evalTree($intree->{right}, $param), ); } if ($contents =~ /^[A-Za-z]$/) { return $param->{$_}; } return $contents; } }

        If you need both the string and the compiled sub, save both instead of recompiling the strings over and over again.

      Thanks for looking closer. If you want to see what kinda amazing stuff this does turn generations up to 100, or 1000 even.

      I thought Eval was computational complexity 0 - which is a unique feature of Perl. At least that what I remember from a Perl calls in Boulder where I met Tom.

      It thinks. :)

        Perl's eval() definitely is not a constant-time operation. It definitely varies with the length and complexity of its input. I'd imagine it's at least O(n) with n being the length of the input string, but it's probably O(n*m*j*k*l) with the definition of m, j, k and l left as an exercise for the reader.

        -sam

        Oh yeah, finally and perhaps most importantly, its not my code.
Re: optimization help
by naikonta (Curate) on Oct 16, 2007 at 01:01 UTC
    if (/^sub /) { return eval->( evalTree($intree->{left},$param), evalTree($intree->{right},$param) );
    This is tricky, but its trickiness doesn't help in any means. I needed to lookup perldoc -f eval again to make myself sure that eval is one of them that takes $_. I still wonder, though, experienced monk like samtregar got bitten too (sorry, Sam) :-) I still thank him for giving me a little hint (thank you, Sam).

    This reminds me of a friend that made use of eval in a "unique" way, the first time I saw that time.

    #!/usr/bin/perl # no -w # no strict eval { ... the rest of the program... }; die $@ if $@;
    There was nothing wrong with this program, syntatically, and it worked. I coudn't explain then, but I knew this was not right so I rejected his code and told him to code in the usual manner. Now I can say that unnecessary double evaluation is evil.

    These two examples prove a) TMTOWTDI principle, b) not all W's are equal.

    Otherwise your tomatoes, rotten cabbage, and half eaten apples are appreciated
    No, I won't do that, especially when I know that this is not originally your own code. Even if it was, there was still no reason to do so. You asked in the proper manner. ikegami has said what was wrong and the correction as well.

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      eval BLOCK and eval STRING are completely different. It's unfortunate that they share the same name. There's (almost) nothing wrong with the code your friend wrote.

        Agreed. It was an eval BLOCK so it happened at compiled-time. I just didn't (and still don't) see any advantage other than aggregating fatal errors checking, such as
        my $dbh = get_db(); # with RaiseError => 1 eval { my $some_sql = get_some_sql_code(); $dbh->do($some_sql); my $another_sql = get_another_sql_code(); $dbh->do($another_sql); ... more query execution }; die $@ if $@;

        Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!