in reply to Re: Regex help pls
in thread Regex help pls

/([0-9]{2,})(?(?{index("0123456789", $1) == -1})(*FAIL))/

It is not clear (at least, not to me) from the OP if the consecutive sub-string in a string like '91239' should be matched or not, and asif has yet to clarify this point. The regex above will match '123' in '91239'.

Not having much experience with the newfangled backtracking control verbs, I wanted, as an exercise, to come up with a version of JavaFan's regex that would only accept 'strictly' consecutive digit strings. Using non-digit look-arounds before and after the regex did the trick, but was not very enlightening about backtracking verbs.

I spent some time trying to use possessive matching and capturing in conjunction with  (*SKIP) and  (*PRUNE) and  (*FAIL) combinations, but without success. It slowly dawned on me that the possessiveness of possessive matching does not affect the start-point of a match, but only the potential end-point and backtracking therefrom. If an otherwise-successful possessive match is forced to fail by (*FAIL), all that happens is that the match start point advances one character and the regex tries again. What I wanted to do was to skip (hint, hint) entirely over a sequence of digits if they failed the test of consecutiveness.

After considerable staring at Special Backtracking Control Verbs in the FM, I finally realized that  (*SKIP) did indeed control the start-point of a match just as the documentation and the specific example promised.

Here's my (very simple) modification to add 'strictness' to the matching. Take out the  (*SKIP) verb from  $skip_if_not_consecutive and a bunch of '12's will be produced.

>perl -wMstrict -le "my $skip_if_not_consecutive = qr{ (?(?{index('0123456789', $^N) == -1}) (*SKIP) (*FAIL)) }xms; ;; my $digits = qr{ \d{2,} }xms; ;; my $str = 'a1a11a9129a912a129a112a122a34a345a'; my @cons = $str =~ m{ ($digits) $skip_if_not_consecutive }xmsg ; ;; my $q_cons = join ' ', map { qq{'$_'} } @cons; print qq{'$str'}; print qq{ $q_cons}; " 'a1a11a9129a912a129a112a122a34a345a' '34' '345'

Learned something today.

Replies are listed 'Best First'.
Re^3: Regex help pls
by JavaFan (Canon) on Jan 23, 2011 at 12:54 UTC
    I'd just add a negative look-behind, and a negative look-ahead to get the "strictness" you're looking for. As in:
    /(?<![0-9])PATTERN_TO_MATCH_CONSECUTIVE_DIGITS(?![0-9])/
    The disadvantage of *SKIP is that is isn't "contained". You cannot easily take a pattern with a *SKIP, and interpolate it in a larger pattern (it's like having subroutines that have 'exits' in them - they're lousy for code reuse).