in reply to Re: Re: Regular Expression Question
in thread Regular Expression Question

you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".

It depends on what you think "should work". The OP's original regex was not anchored, and seemed intended to extract matching substrings rather than confirm that an entire string matched.

The /(\w+(?:\,\w+)*)/ regex will successfully extract the matching "foo" substring from your two sample cases into $1. If you want to check the entire string, then yes, leave out the parenthesis and use ^...$ anchors.

Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

However, if you want to extract matching substrings, I think the single regex is a sensible approach.

Replies are listed 'Best First'.
Re: Re: Re: Re: Regular Expression Question
by nevyn (Monk) on Dec 04, 2003 at 21:45 UTC
    Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

    Generally anything that looks like "(AB*)*" is bad for the backtracking.

    --
    James Antill
Re: Regular Expression Question (show me the can^H^H^Hbenchmark)
by Abigail-II (Bishop) on Dec 05, 2003 at 15:09 UTC
    Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)
    I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion. The benchmark below shows the one regex solution to be somewhat faster - the data sample is tiny though.
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; chomp (our @lines = <DATA>); our (@r1, @r2); cmpthese -10 => { one => '@r1 = map {/^\w+(?:,\w+)*$/ ? 1 : 0} @lines +', two => '@r2 = map {/^[\w,]+$/ && !/^,|,,|,$/ ? 1 : 0} @lines +', }; die "Unequal" unless "@r1" eq "@r2"; __DATA__ one,two,three,four,five ,one,two,three,four,five one,two,three,four,five, one,two,three,,four,five one,two,three four,five Rate two one two 25436/s -- -26% one 34417/s 35% --

    Abigail

      I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion.

      Test and output attached below. Looks like it is dependent on your data set...

      use strict; use Benchmark 'cmpthese'; my @data = <DATA>; my @long = map { join '', $_ x 100 } @data; my %cases = ( 'Single' => sub { for ( @long ) { /^\w+(?:,\w+)*$/ } }, 'Double' => sub { for ( @long ) { /^[\w,]+$/ && ! /^,|,,|,$/ } }, ); cmpthese( 0, \%cases); __DATA__ !@#$as3dfa ,sdfas3df, asd3fsa,,a3sdf as3df,asdf3,3asdf,asd3f sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf3
                Rate Single Double
      Single  6158/s     --   -83%
      Double 35319/s   474%     --