kerrya has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

Trying to determine the best way to match data from a field defined as having up to 3 digits.

Provided that the field contains 3 digits I am able to match using \d{3}

This doesn't pick up data if there are less than 3 digits or white space in the field.

I have tried \d{0,3}|\W{3} and /.{3}with no success.

Would be grateful for your advice.
  • Comment on Pattern matching fields with variable or NULL data

Replies are listed 'Best First'.
Re: Pattern matching fields with variable or NULL data
by ikegami (Patriarch) on Oct 26, 2004 at 23:24 UTC

    Here's a couple of guesses at what you want:

    /\d{0,3}/
    matches:
    "123"
    "  123"
    "123  "
    "  123  "
    "foo123bar"
    doesn't match:
    "1 2 3"

    /^\s*\d{0,3}\s*$/
    matches:
    "123"
    "  123"
    "123  "
    "  123  "
    doesn't match:
    "foo123bar"
    "1 2 3"

    /^\s*(?:\d\s*){0,3}$/
    matches:
    "123"
    "  123"
    "123  "
    "  123  "
    " 1 2 3 "
    doesn't match:
    "foo123bar"

      Thanks ikegami. /\d{0,3}/ works ok if there are three digits somewhere in the field.

      It doesn't match:
      "1 "
      "12 "
      " 1"
      " "

      To attempt to grab any digits in the field, or only white space, I have tried /\d{0,3}|\W{3}/.

      Haven't found any examples on the Web or the Perl reference that might assist as yet. Any suggestions?
        Hi ikegami,

        Figured it out.

        /\d{3}|\d\s\s|\d\d\s|\W{3}/

        Thanks for your help.

        hum, it does matches all of those

        print( "1 " =~ /\d{0,3}/ ?"match":"no match", $/); print( "12 " =~ /\d{0,3}/ ?"match":"no match", $/); print( " 1" =~ /\d{0,3}/ ?"match":"no match", $/); print( " " =~ /\d{0,3}/ ?"match":"no match", $/);

        Now, if you want the regexp to completely match exactly three characters, consiting of 0+ digits padded with spaces on either side, that's much more complicated.

        We can write it out the long way:
        /^(?:   |  \d| \d\d|\d\d\d|\d  |\d\d |\d  )$/
        This doesn't compress well.

        /^[ \d]{3}$/
        will match them all, but it will also match "1 2".

        /^(?: [ \d]{2}|[ \d]{2} |\d{3})$/ will work.

        Anything else I can think of invovles potentially matching more than three characters (/^\s*\d{0,3}\s*$/) or involves more than a regexp.

        If the spaces can only be leading spaces, then the long form would be:
        /^(?:   |  \d| \d\d|\d\d\d)$/
        which doesn't really simplify.

        btw, you should be using \D if you mean non-digit, or \S if you mean non-whitespace. \W doesn't appear to be appropriate here.

Re: Pattern matching fields with variable or NULL data
by Eimi Metamorphoumai (Deacon) on Oct 27, 2004 at 13:38 UTC
    Reading all your comments, it still seems unclear exactly what you want. One thing in particular that stands out is that I can't tell if you're trying to validate data, or extract it. That is, if you're passed four digits, should it just extract three of them, or should it detect the problem? What if the string is only 2 characters long? If you can clarify exactly what you want, there are certainly ways to encode it.

    It sounds like you might want /^[\d\s]{0,3}$/, which works if the length is exactly 3 characters, each a space or a digit. Another approach might be to use more than just a pattern, if (length == 3 && /^\d*\s*$/), for instance, if you want the blanks to follow all the digits. Or if you want to extract the digits part and validate, you could do something like

    if (length != 3 || !/^(\d*)\s*/){ die "invalid data"; } else { print "your number without spaces is '$1'\n"; }
    But again, the first step is figuring out exactly what should and shouldn't match, and what you want to do when it doesn't. After that it's easy.
Re: Pattern matching fields with variable or NULL data
by TedPride (Priest) on Oct 27, 2004 at 11:32 UTC
    while (<DATA>) { print $1."\n" if /^\D*(\d{0,3})\D*$/; } __DATA__ 123  123 123    123   1 2 3 1 12 1