in reply to Regular Expression Question

I'm not sure why people are using more than a single regular expression for this. The OP is asking for a regex that begins with word characters, ends with word characters, and internally may have one or more sequences of word-comma-word.

This expression should do it:

/(\w+(?:\,\w+)*)/

Update: the OP adds above "if I wanted to check for more than 2 consecutive commas", which requires a minor change:

/(\w+(?:\,{1,2}\w+)*)/

Replies are listed 'Best First'.
Re: Re: Regular Expression Question
by nevyn (Monk) on Dec 04, 2003 at 20:48 UTC

    you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".

    So for a working single regexp you want (assuming \w is good enough)...

    /^\w+ # Starts with a "word" (?:,\w+)*$/x; # Followed by many "comma and word" atoms

    Which IMO is ugglier than...

    /^[\w,]+$]/ && # Comma word atoms ! /^,|,,|,$/; # With constraints on commas

    It's probably faster to use 2 regexps too

    --
    James Antill
      you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".

      It depends on what you think "should work". The OP's original regex was not anchored, and seemed intended to extract matching substrings rather than confirm that an entire string matched.

      The /(\w+(?:\,\w+)*)/ regex will successfully extract the matching "foo" substring from your two sample cases into $1. If you want to check the entire string, then yes, leave out the parenthesis and use ^...$ anchors.

      Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

      However, if you want to extract matching substrings, I think the single regex is a sensible approach.

        Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

        Generally anything that looks like "(AB*)*" is bad for the backtracking.

        --
        James Antill
        Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)
        I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion. The benchmark below shows the one regex solution to be somewhat faster - the data sample is tiny though.
        #!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; chomp (our @lines = <DATA>); our (@r1, @r2); cmpthese -10 => { one => '@r1 = map {/^\w+(?:,\w+)*$/ ? 1 : 0} @lines +', two => '@r2 = map {/^[\w,]+$/ && !/^,|,,|,$/ ? 1 : 0} @lines +', }; die "Unequal" unless "@r1" eq "@r2"; __DATA__ one,two,three,four,five ,one,two,three,four,five one,two,three,four,five, one,two,three,,four,five one,two,three four,five Rate two one two 25436/s -- -26% one 34417/s 35% --

        Abigail

Re: Re: Regular Expression Question
by hardburn (Abbot) on Dec 04, 2003 at 21:40 UTC

    \w will also match the underscore, which the orginal poster does not specifically require, and so should not be included.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: Re: Regular Expression Question
by TASdvlper (Monk) on Dec 05, 2003 at 14:56 UTC
    Maybe I'm missing something here ... is the "?:" actually necessary ? I read it in the Camel book and from their example I can't see the relevance of it for this regexp.

    Can someone explain this to me ?