Re: Regex help pls
by luis.roca (Deacon) on Jan 22, 2011 at 19:58 UTC
|
As your question currently stands there are many regular expressions that would match this but may be of no use. You will need to clarify your question and it's context to get a good answer. A couple of thoughts:
So you need to match three separate strings?
- A12345
- 12
- FB123
Will they appear consecutively? IE: A12345 12 FB123
OR
Something like this: Employee ID: A12345, Number of sick days: 12, Date of hire: 1.23.2011, Desk Location: FB123
Those two scenarios will provide very different answers. So to improve how you present your question a little bit here are a few things you might consider:
- Describe the data containing your strings.
- Are they in an Excel, text or HTML document?
- Are there multiple files you will be searching?
- Be as specific as possible about the strings you're trying to match:
- Are the amount of characters in each string fixed or will they vary?
- Will the strings appear consecutively, in the same order or appear more randomly?
- Is there data that will consistently precede or follow these strings?
Also please include your attempts so far at writing a regular expression for the problem you're having.
Hope this helps. Good luck.
"...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote
| [reply] |
Re: Regex help pls
by toolic (Bishop) on Jan 22, 2011 at 20:42 UTC
|
I can not think of a way to do this all with a single regular expression. However, one approach that would work is to:
- Use a regular expression to extract all numbers from your string.
- For each number, determine if it had consecutive digits using index.
| [reply] |
Re: Regex help pls
by AnomalousMonk (Archbishop) on Jan 22, 2011 at 19:47 UTC
|
What Anonymonk asked, plus a clarification: does "strings that have numerals in them which are consequtive integers" mean integers composed only of consecutive decimal digits, or would an integer string like '991299' be acceptable because it has the consecutive digit sequence '12' in it?
Update: Also: Does 'consecutive' mean 'consecutive increasing', or should a (sub-)string like '543' also match?
| [reply] |
Re: Regex help pls
by JavaFan (Canon) on Jan 22, 2011 at 21:08 UTC
|
/([0-9]{2,})(?(?{index("0123456789", $1) == -1})(*FAIL))/ | [reply] [d/l] |
|
|
/([0-9]{2,})(?(?{index("0123456789", $1) == -1})(*FAIL))/
It is not clear (at least, not to me) from the OP if the consecutive sub-string in a string like '91239' should be matched or not, and asif has yet to clarify this point. The regex above will match '123' in '91239'.
Not having much experience with the newfangled backtracking control verbs, I wanted, as an exercise, to come up with a version of JavaFan's regex that would only accept 'strictly' consecutive digit strings. Using non-digit look-arounds before and after the regex did the trick, but was not very enlightening about backtracking verbs.
I spent some time trying to use possessive matching and capturing in conjunction with (*SKIP) and (*PRUNE) and (*FAIL) combinations, but without success. It slowly dawned on me that the possessiveness of possessive matching does not affect the start-point of a match, but only the potential end-point and backtracking therefrom. If an otherwise-successful possessive match is forced to fail by (*FAIL), all that happens is that the match start point advances one character and the regex tries again. What I wanted to do was to skip (hint, hint) entirely over a sequence of digits if they failed the test of consecutiveness.
After considerable staring at Special Backtracking Control Verbs in the FM, I finally realized that (*SKIP) did indeed control the start-point of a match just as the documentation and the specific example promised.
Here's my (very simple) modification to add 'strictness' to the matching. Take out the (*SKIP) verb from $skip_if_not_consecutive and a bunch of '12's will be produced.
>perl -wMstrict -le
"my $skip_if_not_consecutive = qr{
(?(?{index('0123456789', $^N) == -1}) (*SKIP) (*FAIL))
}xms;
;;
my $digits = qr{ \d{2,} }xms;
;;
my $str = 'a1a11a9129a912a129a112a122a34a345a';
my @cons = $str =~
m{ ($digits) $skip_if_not_consecutive }xmsg
;
;;
my $q_cons = join ' ', map { qq{'$_'} } @cons;
print qq{'$str'};
print qq{ $q_cons};
"
'a1a11a9129a912a129a112a122a34a345a'
'34' '345'
Learned something today.
| [reply] [d/l] [select] |
|
|
I'd just add a negative look-behind, and a negative look-ahead to get the "strictness" you're looking for. As in:
/(?<![0-9])PATTERN_TO_MATCH_CONSECUTIVE_DIGITS(?![0-9])/
The disadvantage of *SKIP is that is isn't "contained". You cannot easily take a pattern with a *SKIP, and interpolate it in a larger pattern (it's like having subroutines that have 'exits' in them - they're lousy for code reuse). | [reply] [d/l] |
Re: Regex help pls
by Anonymous Monk on Jan 22, 2011 at 19:06 UTC
|
| [reply] |
Re: Regex help pls
by elef (Friar) on Jan 23, 2011 at 10:54 UTC
|
Depending on what exatly you want your regex to do, you could simply just cheat:
/01|12|23|34|45|56|67|78|89/
It's not elegant, but it's the simplest solution. Even if you need to match series of 3 or more digits, there is very little typing needed.
Obviously, it's pretty easy to expand this to only match strings that contain no other numbers, or strings that contain no letters etc. You gave no information as to what you actually want the regex to do, so no specific solution can be given.
| [reply] [d/l] |
Re: Regex help pls
by locked_user sundialsvc4 (Abbot) on Jan 23, 2011 at 22:45 UTC
|
If a regex does exist that will do this ... I would not want to have to read it. And, so, I would not want to have to encounter code that used it. Far better, methinks, to write a regex that grabs all of the “strings of consecutive digits,” then parse that list with procedural code.
| |