in reply to better (faster) way of writing regexp

I'd say use whichever is more readable — that's "better", IMHO.  I don't think there is much of a performance difference (if any), but if you really want to know, Benchmark it.

  • Comment on Re: better (faster) way of writing regexp

Replies are listed 'Best First'.
Re^2: better (faster) way of writing regexp
by keszler (Priest) on Dec 02, 2009 at 13:18 UTC
    Benchmark confirms:
    use strict; use Benchmark; my $results = timethese( 1e6, { repeat => sub{ my $t1 = '20090123'; $t1 =~ /(\d\d\d\d)(\d\d)(\d\d)/; my ($y1,$m1,$d1) = ($1,$2,$3); }, range => sub{ my $t2 = '20090123'; $t2 =~ /(\d{4})(\d{2})(\d{2})/; my ($y2,$m2,$d2) = ($1,$2,$3); }, } ); my $results2 = timethese( 1e6, { range2 => sub{ my $t3 = '20090123'; $t3 =~ /(\d{4})(\d{2})(\d{2})/; my ($y3,$m3,$d3) = ($1,$2,$3); }, repeat2 => sub{ my $t4 = '20090123'; $t4 =~ /(\d\d\d\d)(\d\d)(\d\d)/; my ($y4,$m4,$d4) = ($1,$2,$3); }, } ); __END__ Benchmark: timing 1000000 iterations of range, repeat... range: 2 wallclock secs ( 1.69 usr + 0.00 sys = 1.69 CPU) @ 59 +2417.06/s (n=1000000) repeat: 2 wallclock secs ( 1.53 usr + 0.00 sys = 1.53 CPU) @ 65 +3167.86/s (n=1000000) Benchmark: timing 1000000 iterations of range2, repeat2... range2: 2 wallclock secs ( 1.70 usr + 0.00 sys = 1.70 CPU) @ 58 +6854.46/s (n=1000000) repeat2: 0 wallclock secs ( 1.53 usr + 0.00 sys = 1.53 CPU) @ 65 +3167.86/s (n=1000000)

      At those rates the differences shown are pretty much meaningless. And I simply don't believe repeat2 - how can wallclock be 0, but the other results identical to repeat?


      True laziness is hard work
        I don't know what happened with repeat2; the results I posted were cut-n-pasted unchanged. I plan to dig into it tonight to see if I can reproduce and diagnose the glitch.
Re^2: better (faster) way of writing regexp
by TomDLux (Vicar) on Dec 03, 2009 at 03:17 UTC

    Appended qw(:all) to use Benchmark line.

    Changed 1e6 to -5 to run for 5 seconds, rather than a particular count ... not that it makes much difference.

    Changed timethese to cmpthese to generate the following table.

               Rate  range repeat
    range  483495/s     --    -5%
    repeat 507038/s     5%     --
                Rate  range2 repeat2
    range2  485834/s      --     -4%
    repeat2 506114/s      4%      --
    
    

    I wouldn't get excited about a 4% or 5% difference ... bet you're making 25% inefficiencies elsewhere, assuming you aren't using algorithmns that are costing you hundreds of %. I don't mean just you, anybody's code, including mine.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA