Help with regular expression

heman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Help with regular expression by AnomalousMonk (Archbishop) on Nov 12, 2014 at 03:53 UTC
It's not clear to me what you are trying to achieve. Can you please post some short example strings that should be accepted and ignored, a few of each, indicating which is which? Update: Your regex is basic enough (i.e., no regex features added after Perl version 5.6) that YAPE::Regex::Explain may be enlightening (update: fixed the regex: added [] as needed per this): c:\@Work\Perl\monks>perl -wMstrict -le "use YAPE::Regex::Explain; ;; print YAPE::Regex::Explain->new(qr/\b(\s([a-zA-Z ]+)\s((-?[0-9]){1,2 +})\b)/)->explain; " The regular expression: (?-imsx:\b(\s([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- [a-zA-Z ]+ any character of: 'a' to 'z', 'A' to 'Z', ' ' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- ( group and capture to \4 (between 1 and 2 times (matching the most amount possible)): ---------------------------------------------------------------------- -? '-' (optional (matching the most amount possible)) ---------------------------------------------------------------------- [0-9] any character of: '0' to '9' ---------------------------------------------------------------------- ){1,2} end of \4 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \4) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l]
Re: Help with regular expression by Anonymous Monk on Nov 12, 2014 at 02:30 UTC
Put the regular expression in code tags `<c> $_ = 'the string of your input'; /theregesofyourprogram/g; </c>` [download]	[reply] [d/l]
Re^2: Help with regular expression by heman (Novice) on Nov 12, 2014 at 03:04 UTC
regular expression: `$_=~/\b(\s*([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g`	[reply] [d/l]
Re^3: Help with regular expression by AnomalousMonk (Archbishop) on Nov 12, 2014 at 03:47 UTC
Please note that you could (and IMHO should) have added code tags in the OP above rather than in a separate node.	[reply]
Re: Help with regular expression by plutomaster (Initiate) on Nov 12, 2014 at 05:55 UTC
Looks there is a possibility that only one space between the string and digit numbers, if that is the case, you can try this: `/\b(\s([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g`	[reply] [d/l]
Re: Help with regular expression by Laurent_R (Canon) on Nov 12, 2014 at 18:49 UTC
Actually, your regular expression seems correct (although it could possibly be somewhat simpler) for the description you made (except that I am not sure what you really want to capture) and it works, as shown in this session under the Perl debugger. `DB<4> $_ = 'foobar 42'; DB<5> print $1 if /\b(\s([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g; foobar 42 DB<6> $_ = 'foobar 666'; DB<7> print $1 if /\b(\s([a-zA-Z ]+)\s((-?[0-9]){1,2})\b)/g; DB<8>` [download] As you can see, your regex matched `foobar 42` but did not match `foobar 422`, which appears to be what you want. If that is not what you want, then you should explain what exactly you want, and preferably, as already suggested, please also show some example of what should match (and what should be captured) and what should not match.	[reply] [d/l] [select]
Re^2: Help with regular expression by heman (Novice) on Nov 13, 2014 at 01:20 UTC
Guys, Here is what i want my regular expression to do: $_ = 'xyz 12 45' then my regular expression should fail. Also, $_='xyz 123' then my regular expression should fail. But, $_= 'xyz 12' then it should pass. However, i see that the 2nd and 3rd scenario's are workign but, not the 1st one. My regular expression seems to take the 1st one i.e xyz 12 45 as xyz 12 and process the expression. Which is not what i want :(	[reply]
Re^3: Help with regular expression by toolic (Bishop) on Nov 13, 2014 at 01:45 UTC
`use warnings; use strict; while (<DATA>) { if (/^[a-z]+\s+\d{2}$/i) { print "PASS $_"; } else { print "FAIL $_"; } } __DATA__ xyz 12 45 xyz 123 xyz 12` [download] outputs: `FAIL xyz 12 45 FAIL xyz 123 PASS xyz 12` [download]	[reply] [d/l] [select]
Re^4: Help with regular expression by heman (Novice) on Nov 13, 2014 at 02:42 UTC
Re^3: Help with regular expression by graff (Chancellor) on Nov 13, 2014 at 02:26 UTC
toolic's solution is the right answer for the few examples you gave, but since you haven't said anything about the larger context of strings you might be getting as input, how should the following be treated by your intended regex? `(foo 12) foo 12 bar foo 12, bar 34 Foour score and 12 years ago there were 34 bars in this town.` [download] All the above would fail, given toolic's very tight use of anchors (^ and $). If that's what you want, then the problem is solved.	[reply] [d/l]