in reply to Re: Regex, capturing variables vs. speed
in thread Regex, capturing variables vs. speed

Thanks to all for feedback. I did get much more than 2x speed increase for greedy vs. not because my line to match is quite long. Taking what I've learned from the thread, I did some comparisons.
use Benchmark qw/cmpthese/; my $line = 'rs11502186 C/G Chr11 170472 + ncbi_b34 perlegen urn:lsid:p +erlegen.hapmap.org:Protocol:Genotyping_1.0.0:2 urn:lsid:perlegen.hapm +ap.org:Assay:25763.7541533:1 urn:lsid:dcc.hapmap.org:Panel:CEPH-30-tr +ios:1 QC+ GG GG GG NN GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG + GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG NN + GG GG GG NN GG GG GG GG GG GG GG GG GG GG GG GG GG NN GG GG GG GG NN + GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG + GG GG GG GG GG GG'; cmpthese(-1, { 'Greedy' => sub {$line =~ /(chr.*?)\s.*urn:lsid:(.*?)\s.*p +anel:(.*?):/i}, 'Non' => sub {$line =~ /(chr.*?)\s.*?urn:lsid:(.*?)\s.*?pa +nel:(.*?):/i}, 'Sep' => sub {$line =~ /(chr.*?)\s/i; $line =~ /urn:lsid:(.*?)\s/i; $line =~ /panel:(.*?):/i; }, 'Death_star' => sub {$line =~ /(chr[^\s]+)/i; $line =~ /urn:lsid:([^\s]+)/i; $line =~ /panel:([^:]+)/i; } } );
Giving the following results:
Rate Greedy Sep Non Death_star Greedy 8650/s -- -95% -95% -97% Sep 157827/s 1725% -- -6% -41% Non 167020/s 1831% 6% -- -38% Death_star 267963/s 2998% 70% 60% --
Killing the star is clearly the way to go. Thanks to the Monks which helped me learn something.

-albert

Replies are listed 'Best First'.
Re^3: Regex, capturing variables vs. speed
by robin (Chaplain) on Oct 30, 2005 at 20:31 UTC
    Even faster is to use a single match, but to be explicit about what you're looking for, i.e.
    'Better' => sub { $line =~ /(chr\S*).*?urn:lsid:(\S*).*?panel:([^:]*)/i }
    It's always better to write (\S*)\s than (.*?)\s, because you're making it clear to the matching engine exactly what you're looking for (non-space characters in this case).
      Thanks. Speed yet better as you say....
      Rate Sep Non Death_star Better Sep 158510/s -- -10% -23% -65% Non 175363/s 11% -- -15% -61% Death_star 206769/s 30% 18% -- -54% Better 449757/s 184% 156% 118% --
      -albert
Re^3: Regex, capturing variables vs. speed
by GrandFather (Saint) on Oct 30, 2005 at 23:42 UTC

    Interesting, there must be Perl differences too. The spread is not as great with Active State Perl v5.8.7. In particular, Better is not as much better.

    Rate Greedy Sep Death_star Non Be +tter Greedy 9965/s -- -90% -94% -94% +-97% Sep 102700/s 931% -- -34% -37% +-66% Death_star 155342/s 1459% 51% -- -5% +-48% Non 164099/s 1547% 60% 6% -- +-46% Better 301485/s 2925% 194% 94% 84% + --

    The results above used OP's benchmark code (with the addition of Better).


    Perl is Huffman encoded by design.