TASdvlper has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm a little stuck with a regular expression (probably not the 1st time you heard that). Anyway, I was to allow a string to have only alphanumber characters and commas. But, and here is my problem, I don't want to allow to consecutive commas and I don't want to string to being or end in a comma.

here is the RE that I'm have "sort of" working.
/([a-zA-Z0-9,{1}]+)/
Any help would be greatly appreciated.

edited: Thu Dec 4 19:30:18 2003 by jeffa - code tags

Replies are listed 'Best First'.
Re: Regular Expression Question
by !1 (Hermit) on Dec 04, 2003 at 19:33 UTC

    It would probably be far easier to first verify that the string contains legitimate characters and then check whether or not any of the conditions that would make it illegal exist.

    #!/usr/bin/perl -wl use strict; for (<DATA>) { chomp; if (! /[^a-zA-Z0-9,]/ and ! /^,|,{2}|,$/) { print "Yep: $_"; } else { print "Nope: $_"; } } __DATA__ this,should,work,11 ,this,should,not,work this,,too,should,not,work nor,this,one,

    Please please please read perldoc perlretut and perldoc perlre for a deeper understanding of regular expressions.

      Thanks for your help. Why is there a "!" in front of the 1st regexp ? I understand the 2nd.
        The character class is negated, note where the ^ is...
        ! /[^a-zA-Z0-9,]/
        ...however I'm not sure the double negative is faster enough to warrant not just doing the more obvious...
        /^[a-zA-Z0-9,]+$/
        ...it's also not stated whether /^$/ is valid or not, which is different for the above two.
        --
        James Antill
Re: Regular Expression Question
by BrowserUk (Patriarch) on Dec 04, 2003 at 19:35 UTC

    It's probably simpler and clearer (as well as more efficient) to use two regexes for this type of thing.

    for ('fred,bill', 'fred,,bill', ',fred', 'bill,') { print "Bad:'$_'" unless $_ =~ m[^[a-zA-Z0-9,]+$] and $_ !~ m[,{2}|,$|^,]; } Bad:'fred,,bill' Bad:',fred' Bad:'bill,'

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

Re: Regular Expression Question
by kesterkester (Hermit) on Dec 04, 2003 at 19:36 UTC

    If multiple regexes are acceptable, this'll get you a little closer to what you want, I think. The first regex in the if statement matches the alphanumerics and commas, but not leading/trailing commas; the second regex excludes consecutive commas.

    use warnings; use strict; while ( <DATA> ) { print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,,/; } __DATA__ !@#$as3dfa ,sdfas3df, asd3fsa,,a3sdf as3df,asdf3,3asdf,asd3f sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf3

    Output is:

    as3dfa sdfas3df as3df,asdf3,3asdf,asd3f sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf3
      print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,,/;

      So, if I wanted to check for more than 2 consecutive commas (I should have mentioned that in the original post) I would do the following:

      print "$1\n" if /^(\w+[\w,]+\w+)$/ && !/,{2,}/;
        Either would work-- !/,,/ exludes strings containing 2 consecutive commas, and !/,{2,}/ excludes strings containing 2-or-more consecutive commas, which amounts to pretty much the same thing for what you want, if I've understood you correctly.

      Problem is, this doesn't work for very short strings (2 or fewer characters). Add these:

      __DATA__ a bc

      dave

Re: Regular Expression Question
by simonm (Vicar) on Dec 04, 2003 at 20:20 UTC
    I'm not sure why people are using more than a single regular expression for this. The OP is asking for a regex that begins with word characters, ends with word characters, and internally may have one or more sequences of word-comma-word.

    This expression should do it:

    /(\w+(?:\,\w+)*)/

    Update: the OP adds above "if I wanted to check for more than 2 consecutive commas", which requires a minor change:

    /(\w+(?:\,{1,2}\w+)*)/

      you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".

      So for a working single regexp you want (assuming \w is good enough)...

      /^\w+ # Starts with a "word" (?:,\w+)*$/x; # Followed by many "comma and word" atoms

      Which IMO is ugglier than...

      /^[\w,]+$]/ && # Comma word atoms ! /^,|,,|,$/; # With constraints on commas

      It's probably faster to use 2 regexps too

      --
      James Antill
        you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".

        It depends on what you think "should work". The OP's original regex was not anchored, and seemed intended to extract matching substrings rather than confirm that an entire string matched.

        The /(\w+(?:\,\w+)*)/ regex will successfully extract the matching "foo" substring from your two sample cases into $1. If you want to check the entire string, then yes, leave out the parenthesis and use ^...$ anchors.

        Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

        However, if you want to extract matching substrings, I think the single regex is a sensible approach.

      \w will also match the underscore, which the orginal poster does not specifically require, and so should not be included.

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

      Maybe I'm missing something here ... is the "?:" actually necessary ? I read it in the Camel book and from their example I can't see the relevance of it for this regexp.

      Can someone explain this to me ?

Re: Regular Expression Question
by Abigail-II (Bishop) on Dec 04, 2003 at 23:03 UTC
    use Regexp::Common; /^$RE{list}{-sep => ','}{-pat => '\w+'}$/

    Abigail

      Didn't relalized there was Regexp module. For someone just learning how to use regexg, would you recommend understanding the basics first, prior to using the module.
        I don't have an answer to that. It's like asking I have a car, and I want to hire a driver. Should I first learn to drive myself?. It all depends. If you want to drive yourself as well, then it's worthwhile to learn to drive. If you're not going to drive, no need to learn it. But if you are going to drive, I don't know whether you should delay hiring a driver before you've mastered to drive yourself. I don't even know whether it matters what you do first.

        Abigail

Re: Regular Expression Question
by ysth (Canon) on Dec 05, 2003 at 00:03 UTC
    Just for fun, a different way to approach the problem: /^(?:[A-Za-z0-9],[A-Za-z0-9]|[A-Za-z0-9])+\z/