Re: Regular Expression Question (show me the can^H^H^Hbenchmark)

Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion. The benchmark below shows the one regex solution to be somewhat faster - the data sample is tiny though.

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw /timethese cmpthese/;

chomp (our @lines = <DATA>);

our (@r1, @r2);

cmpthese -10 => {
    one      =>  '@r1 = map {/^\w+(?:,\w+)*$/          ? 1 : 0} @lines
+',
    two      =>  '@r2 = map {/^[\w,]+$/ && !/^,|,,|,$/ ? 1 : 0} @lines
+',
};

die "Unequal" unless "@r1" eq "@r2";


__DATA__
one,two,three,four,five
,one,two,three,four,five
one,two,three,four,five,
one,two,three,,four,five
one,two,three four,five




       Rate  two  one
two 25436/s   -- -26%
one 34417/s  35%   --
[download]

Abigail

Comment on Re: Regular Expression Question (show me the can^H^H^Hbenchmark) Download Code

Replies are listed 'Best First'.
Re: Re: Regular Expression Question (show me the can^H^H^Hbenchmark) by simonm (Vicar) on Dec 05, 2003 at 21:56 UTC
I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion. Test and output attached below. Looks like it is dependent on your data set... `use strict; use Benchmark 'cmpthese'; my @data = <DATA>; my @long = map { join '', $_ x 100 } @data; my %cases = ( 'Single' => sub { for ( @long ) { /^\w+(?:,\w+)*$/ } }, 'Double' => sub { for ( @long ) { /^[\w,]+$/ && ! /^,\|,,\|,$/ } }, ); cmpthese( 0, \%cases); __DATA__ !@#$as3dfa ,sdfas3df, asd3fsa,,a3sdf as3df,asdf3,3asdf,asd3f sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf3` [download] Rate Single Double Single 6158/s -- -83% Double 35319/s 474% --	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Regular Expression Question (show me the can^H^H^Hbenchmark)
by simonm (Vicar) on Dec 05, 2003 at 21:56 UTC

I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion.

Test and output attached below. Looks like it is dependent on your data set...

use strict;
use Benchmark 'cmpthese';

my @data = <DATA>;
my @long = map { join '', $_ x 100 } @data;

my %cases = (
  'Single' => sub { for ( @long ) { /^\w+(?:,\w+)*$/ } },
  'Double' => sub { for ( @long ) { /^[\w,]+$/ && ! /^,|,,|,$/ } },
);

cmpthese( 0, \%cases);

__DATA__
!@#$as3dfa
,sdfas3df,
asd3fsa,,a3sdf
as3df,asdf3,3asdf,asd3f
sad3fasdjasdfkasdfklas3jf
3sad3fasdjasdfkasdfklas3jf
3sad3fasdjasdfkasdfklas3jf3
[download]

          Rate Single Double
Single  6158/s     --   -83%
Double 35319/s   474%     --

[reply]
[d/l]