heman has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys, I'm trying to write a small beginner level script which should read a string followed by 2 digit number only once. If the 2 digit number repeats more than once after a space then it should be ignored.

However, my regular expression doesn't seem to work :(

Can somebody help me correct the same?

Here is my regular expression:

/\b(\s*([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g

Thanks

Replies are listed 'Best First'.
Re: Help with regular expression
by AnomalousMonk (Archbishop) on Nov 12, 2014 at 03:53 UTC

    It's not clear to me what you are trying to achieve. Can you please post some short example strings that should be accepted and ignored, a few of each, indicating which is which?

    Update: Your regex is basic enough (i.e., no regex features added after Perl version 5.6) that YAPE::Regex::Explain may be enlightening (update: fixed the regex: added [] as needed per this):

    c:\@Work\Perl\monks>perl -wMstrict -le "use YAPE::Regex::Explain; ;; print YAPE::Regex::Explain->new(qr/\b(\s*([a-zA-Z ]+)\s((-?[0-9]){1,2 +})\b)/)->explain; " The regular expression: (?-imsx:\b(\s*([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- [a-zA-Z ]+ any character of: 'a' to 'z', 'A' to 'Z', ' ' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- ( group and capture to \4 (between 1 and 2 times (matching the most amount possible)): ---------------------------------------------------------------------- -? '-' (optional (matching the most amount possible)) ---------------------------------------------------------------------- [0-9] any character of: '0' to '9' ---------------------------------------------------------------------- ){1,2} end of \4 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \4) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

Re: Help with regular expression
by Anonymous Monk on Nov 12, 2014 at 02:30 UTC
    Put the regular expression in code tags
    <c> $_ = 'the string of your input'; /theregesofyourprogram/g; </c>
      regular expression: $_=~/\b(\s*([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g

        Please note that you could (and IMHO should) have added code tags in the OP above rather than in a separate node.

Re: Help with regular expression
by plutomaster (Initiate) on Nov 12, 2014 at 05:55 UTC
    Looks there is a possibility that only one space between the string and digit numbers, if that is the case, you can try this:  /\b(\s*([a-zA-Z ]+)\s*((-?[0-9]){1,2})\b)/g
Re: Help with regular expression
by Laurent_R (Canon) on Nov 12, 2014 at 18:49 UTC
    Actually, your regular expression seems correct (although it could possibly be somewhat simpler) for the description you made (except that I am not sure what you really want to capture) and it works, as shown in this session under the Perl debugger.
    DB<4> $_ = 'foobar 42'; DB<5> print $1 if /\b(\s*([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g; foobar 42 DB<6> $_ = 'foobar 666'; DB<7> print $1 if /\b(\s*([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g; DB<8>
    As you can see, your regex matched foobar 42 but did not match foobar 422, which appears to be what you want.

    If that is not what you want, then you should explain what exactly you want, and preferably, as already suggested, please also show some example of what should match (and what should be captured) and what should not match.

      Guys, Here is what i want my regular expression to do: $_ = 'xyz 12 45' then my regular expression should fail. Also, $_='xyz 123' then my regular expression should fail. But, $_= 'xyz 12' then it should pass. However, i see that the 2nd and 3rd scenario's are workign but, not the 1st one. My regular expression seems to take the 1st one i.e xyz 12 45 as xyz 12 and process the expression. Which is not what i want :(
        use warnings; use strict; while (<DATA>) { if (/^[a-z]+\s+\d{2}$/i) { print "PASS $_"; } else { print "FAIL $_"; } } __DATA__ xyz 12 45 xyz 123 xyz 12

        outputs:

        FAIL xyz 12 45 FAIL xyz 123 PASS xyz 12
        toolic's solution is the right answer for the few examples you gave, but since you haven't said anything about the larger context of strings you might be getting as input, how should the following be treated by your intended regex?
        (foo 12) foo 12 bar foo 12, bar 34 Foour score and 12 years ago there were 34 bars in this town.
        All the above would fail, given toolic's very tight use of anchors (^ and $). If that's what you want, then the problem is solved.