in reply to Re^5: Regex Modification
in thread Regex Modification

Thnx for the reply.. Can pls explain how the pattern matching is being done?

Replies are listed 'Best First'.
Re^7: Regex Modification
by AnomalousMonk (Archbishop) on Apr 14, 2013 at 07:13 UTC

    Here is a further simplified (and tested) version of the regex. The  $digits and  $diffs package variables are no longer needed, so I'm a little happier with this version, but it still uses absolute capture group numbering and embedded code. I could perhaps use named captures to get around the numbering problem, but I don't see what I can do about the code.

    There are a few more comments that may be helpful, and davido's nice Perl Regular Expression Tester may be enlightening. I may get around to posting a more detailed commentary on the regex in the next couple of days.

    my $ndn = qr{ # cannot begin after digit or any differentiator char (?<! \d) (?<! $diff) # begin potential main pattern capture to group 1 ($d_min # begin group 1 with minimum digits ($diff)? # group 2: possible differentiator char # match to max number of digit(s)/single-diff groups (?: \d+ \g{-1} (?= \d)){0,9} # end group 1 (main pattern) capture with minimum digits $d_min) # end group 1 # main pattern cannot be followed by a digit... (?! \d) # ...or by the diff char, or by any diff char if none present (?(2) (?! \g{-1}) | (?! $diff)) # qualify potential main pattern for min/max digits (?(?{ $1 =~ tr/0-9// > 15 || $1 =~ tr/0-9// < 9 }) (*FAIL)) }xms;
Re^7: Regex Modification
by AnomalousMonk (Archbishop) on Apr 16, 2013 at 10:02 UTC

    Update: I finally realized that  $1 in the
        (?(?{ $1 =~ tr/0-9// > 15 || $1 =~ tr/0-9// < 9 }) (*FAIL))
    sub-pattern above can be replaced by  $^N to eliminate one absolute back-reference. Using a named capture group does the trick for the remaining absolute capture, giving the regex below. (However, there may be a speed penalty associated with named captures – but I haven't Benchmark-ed this.)

    my $ndn = qr{ # cannot begin after digit or any differentiator char. (?<! \d) (?<! $diff) # begin potential main pattern capture. ($d_min # begin main pattern group with minimum digits (?<DIFF> $diff)? # group DIFF: possible differentiator char # match to max number of digit(s)/single-diff groups. (?: \d+ \k{DIFF} (?= \d)){0,9} # end main pattern group capture with minimum digits. $d_min) # end main group # main pattern cannot be followed by a digit or... (?! \d) # ... by the diff char if any, else by any diff char. (?(<DIFF>) (?! \k{DIFF}) | (?! $diff)) # qualify potential main pattern for min/max digits. (?(?{ $^N =~ tr/0-9// < 9 || $^N =~ tr/0-9// > 15 }) (*FAIL)) }xms;
Re^7: Regex Modification
by AnomalousMonk (Archbishop) on Apr 13, 2013 at 20:07 UTC

    Below is an updated version of the regex. It is simplified a little, and an error is corrected. (Update: And it now matches something like 'x123-456789123-456x'.) I am still less than happy with it: it is over-complicated (Update: and it uses package variables), and it is not standalone because of its use of embedded capture groups that make it sensitive to the presence of other capture groups if it is used in combination with other regexes.

    In any event, it works. Please see the embedded comments for a brief explanation of how the regex works, and see perlre and perlretut for more detailed info. The  m1() test function returns the number of matches in a string if called in scalar context, and a list of all the matching sub-strings if called in list context. If you have more questions, please let me know. As before, HTH.

    Code:

    Output: