Re: Regex, capturing variables vs. speed

Following on from the previous replies here is a benchmark demonstrating the performance difference. Note though that with the test string the speed difference is only of the order of two times, not the 10 times described by OP.

use warnings;
use strict;
use Benchmark qw(cmpthese);

my $target = 'This is a string used to test the time required for a gr
+eedy match compared to a non-greedy match.';
my $greedy = qr/(\ba\b.*\bstring\b)/;
my $non = qr/(\ba\b.*?\bstring\b)/;

my ($matchG) = $target =~ $greedy;
my ($matchN) = $target =~ $non;

die "Matches generate different results\n" if $matchG ne $matchN;

cmpthese
  (
  -1,
    {
    'Greedy' => sub {$target =~ $greedy;},
    'Non' => sub {$target =~ $non;}
    }
  );

Prints:

           Rate Greedy    Non
Greedy 162689/s     --   -64%
Non    456847/s   181%     --
[download]

Perl is Huffman encoded by design.

Comment on Re: Regex, capturing variables vs. speed Download Code

Replies are listed 'Best First'.
Re^2: Regex, capturing variables vs. speed by albert (Monk) on Oct 30, 2005 at 17:57 UTC
Thanks to all for feedback. I did get much more than 2x speed increase for greedy vs. not because my line to match is quite long. Taking what I've learned from the thread, I did some comparisons. use Benchmark qw/cmpthese/; my $line = 'rs11502186 C/G Chr11 170472 + ncbi_b34 perlegen urn:lsid:p +erlegen.hapmap.org:Protocol:Genotyping_1.0.0:2 urn:lsid:perlegen.hapm +ap.org:Assay:25763.7541533:1 urn:lsid:dcc.hapmap.org:Panel:CEPH-30-tr +ios:1 QC+ GG GG GG NN GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG + GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG NN + GG GG GG NN GG GG GG GG GG GG GG GG GG GG GG GG GG NN GG GG GG GG NN + GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG + GG GG GG GG GG GG'; cmpthese(-1, { 'Greedy' => sub {$line =~ /(chr.?)\s.urn:lsid:(.?)\s.p +anel:(.?):/i}, 'Non' => sub {$line =~ /(chr.?)\s.?urn:lsid:(.?)\s.?pa +nel:(.?):/i}, 'Sep' => sub {$line =~ /(chr.?)\s/i; $line =~ /urn:lsid:(.?)\s/i; $line =~ /panel:(.*?):/i; }, 'Death_star' => sub {$line =~ /(chr[^\s]+)/i; $line =~ /urn:lsid:([^\s]+)/i; $line =~ /panel:([^:]+)/i; } } ); [download] Giving the following results: `Rate Greedy Sep Non Death_star Greedy 8650/s -- -95% -95% -97% Sep 157827/s 1725% -- -6% -41% Non 167020/s 1831% 6% -- -38% Death_star 267963/s 2998% 70% 60% --` [download] Killing the star is clearly the way to go. Thanks to the Monks which helped me learn something. -albert	[reply] [d/l] [select]
Re^3: Regex, capturing variables vs. speed by robin (Chaplain) on Oct 30, 2005 at 20:31 UTC
Even faster is to use a single match, but to be explicit about what you're looking for, i.e. `'Better' => sub { $line =~ /(chr\S).?urn:lsid:(\S).?panel:([^:])/i }` [download] It's always better to write `(\S)\s` than `(.*?)\s`, because you're making it clear to the matching engine exactly what you're looking for (non-space characters in this case).	[reply] [d/l]
Re^4: Regex, capturing variables vs. speed by albert (Monk) on Oct 30, 2005 at 21:06 UTC
Thanks. Speed yet better as you say.... `Rate Sep Non Death_star Better Sep 158510/s -- -10% -23% -65% Non 175363/s 11% -- -15% -61% Death_star 206769/s 30% 18% -- -54% Better 449757/s 184% 156% 118% --` [download] -albert	[reply] [d/l]
Re^3: Regex, capturing variables vs. speed by GrandFather (Saint) on Oct 30, 2005 at 23:42 UTC
Interesting, there must be Perl differences too. The spread is not as great with Active State Perl v5.8.7. In particular, Better is not as much better. `Rate Greedy Sep Death_star Non Be +tter Greedy 9965/s -- -90% -94% -94% +-97% Sep 102700/s 931% -- -34% -37% +-66% Death_star 155342/s 1459% 51% -- -5% +-48% Non 164099/s 1547% 60% 6% -- +-46% Better 301485/s 2925% 194% 94% 84% + --` [download] The results above used OP's benchmark code (with the addition of Better). Perl is Huffman encoded by design.	[reply] [d/l]