in reply to Re: how can I speed up this perl??
in thread how can I speed up this perl??

Yeah, I also like the idea of using strings instead of arrays. When I read the problem, my first thought was to try something like...
$genome=~s/(.)(?=(.))/++$count{$1.$2} and undef/eg;
...does anyone else tend to (ab)use the replacement operator as sort of a map which works on strings?

Replies are listed 'Best First'.
Re: using s/// as map?
by dragonchild (Archbishop) on Nov 24, 2003 at 18:45 UTC
    Why do the lookahead? Why not just
    $genome =~ s/(..)/++$count{$1} and undef/eg;

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    ... strings and arrays will suffice. As they are easily available as native data types in any sane language, ... - blokhead, speaking on evolutionary algorithms

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      I'm assuming that the original poster wanted the frequency of all pairs, so we don't want the regex engine to consume more than one character per iteration. With...
      $genome="acgt";
      Then the look ahead version will finish with...
      $count{ac} == 1 $count{cg} == 1 #Note this match $count{gt} == 1
      ...while the non-lookahead version will have...
      $count{ac} == 1 $count{gt} == 1
      ... i.e. it misses the "cg" case because it jumps through the string two characters at a time.
Re: using s/// as map?
by thospel (Hermit) on Nov 24, 2003 at 19:24 UTC
    I use it a lot in perlgolf, but not in real code since it tends to destroy the string (notice that your code does, though you tried to avoid that I think). In real code using a while on a regex in scalar context is slightly faster and not so dangerous. When programming an inner loop and trying to be blazingly fast, you also should avoid things like $1.$2 since constructing a new value is a relatively expensive operation in perl. So the regex variant I'd use is:
    $count{$1}++ while $genome =~ /(?=(..))./g
    But this is still twice as slow as the substr() variant.
      This should be a little better.
      ++$count{$1} while $genome =~ /\G(..)/g;
      The PerlMonks advocate for tr///
        You're only doing half the work now :-)

        You can make it work like this though:

        ++$count{$1},pos($genome)-- while $genome =~ /\G(..)/g
        It's interesting that this is slightly faster than dropping the \G.

        This however turns out to be a lot slower (seems to do a complete linear search for the proper string start every time):

        $genome =~ /./g; $count{$1}++ while $genome =~ /(.\G.)/g

        update This however means that the lookahead version can be speeded up by about 5% too:

        $count{$1}++ while $genome =~ /\G(?=(..))./g
Re: using s/// as map?
by BrowserUk (Patriarch) on Nov 24, 2003 at 21:17 UTC

    You could also use unpack

    print unpack '(A2X)*', 'abcdefghijklmnopqrstuvwxyz'; ab bc cd de ef fg gh hi ij jk kl lm mn no op pq qr rs st tu uv vw wx x +y yz z

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

Re: using s/// as map?
by jryan (Vicar) on Nov 24, 2003 at 23:10 UTC

    No, and for good reason - its very slow compared to a normal solution. That "e" stands for "eval string", which is pretty sluggish, and should only be used for good reason. Compare:

    use Benchmark; use strict; our $dummy = 0; our %count; my $gen = ""; foreach my $l (qw|a t c g|) { $gen .= "$l$_" for qw|a t c g|; } my @gen = split '', $gen; my %tests; foreach my $test ( qw|chain subst hash_array hash_string| ) { no strict refs; $tests{$test} = sub { *{$test}{CODE}->($gen,\@gen) } } timethese(100000, \%tests); sub chain { my ($gen,$genome) = @_; for my $i (0..$#$genome) { if (($genome->[$i] eq 'a') && ($genome->[$i+1] eq 'a')) { + ++$dummy; } elsif (($genome->[$i] eq 'a') && ($genome->[$i+1] eq 'g')) { + ++$dummy; } elsif (($genome->[$i] eq 'a') && ($genome->[$i+1] eq 'c')) { + ++$dummy; } elsif (($genome->[$i] eq 'a') && ($genome->[$i+1] eq 't')) { + ++$dummy; } elsif (($genome->[$i] eq 't') && ($genome->[$i+1] eq 'a')) { + ++$dummy; } elsif (($genome->[$i] eq 't') && ($genome->[$i+1] eq 'g')) { + ++$dummy; } elsif (($genome->[$i] eq 't') && ($genome->[$i+1] eq 'c')) { + ++$dummy; } elsif (($genome->[$i] eq 't') && ($genome->[$i+1] eq 't')) { + ++$dummy; } elsif (($genome->[$i] eq 'c') && ($genome->[$i+1] eq 'a')) { + ++$dummy; } elsif (($genome->[$i] eq 'c') && ($genome->[$i+1] eq 'g')) { + ++$dummy; } elsif (($genome->[$i] eq 'c') && ($genome->[$i+1] eq 'c')) { + ++$dummy; } elsif (($genome->[$i] eq 'c') && ($genome->[$i+1] eq 't')) { + ++$dummy; } elsif (($genome->[$i] eq 'g') && ($genome->[$i+1] eq 'a')) { + ++$dummy; } elsif (($genome->[$i] eq 'g') && ($genome->[$i+1] eq 'g')) { + ++$dummy; } elsif (($genome->[$i] eq 'g') && ($genome->[$i+1] eq 'c')) { + ++$dummy; } elsif (($genome->[$i] eq 'g') && ($genome->[$i+1] eq 't')) { + ++$dummy; } } } sub subst { my ($gen,$genome) = @_; $gen=~s/(.)(?=(.))/++$count{$1.$2} and undef/eg; } sub hash_array { my ($gen,$genome) = @_; $count { $genome -> [$_] . $genome -> [$_ + 1] } ++ for (0..$#$gen +ome); } sub hash_string { my ($gen,$genome) = @_; $count { substr($gen, $_, 2) }++ for (0..length($gen)-2); } __DATA__ Benchmark: timing 100000 iterations of chain, hash_array, hash_string, + subst... chain: 64 wallclock secs (41.09 usr + 0.00 sys = 41.09 CPU) @ 2 +433.68/s (n=100000) hash_array: 14 wallclock secs (11.24 usr + 0.00 sys = 11.24 CPU) @ 8 +896.80/s (n=100000) hash_string: 8 wallclock secs ( 6.26 usr + 0.00 sys = 6.26 CPU) @ 1 +5974.44/s (n=100000) subst: 17 wallclock secs (16.60 usr + 0.00 sys = 16.60 CPU) @ 6 +024.10/s (n=100000)

      Erm. Did you post the wrong version of your code?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!
      Wanted!

        No? Would you care to elaborate? I just retested, and it works fine.

Re: using s/// as map?
by etcshadow (Priest) on Nov 25, 2003 at 06:38 UTC
    I tend to love to do it! (And I do not consider it abuse... I once wrote a brainf*ck interpretter that was just one big s/.../.../eg. Now who could call that abuse?!? ;-D)

    ------------
    :Wq
    Not an editor command: Wq