tj_thompson has asked for the wisdom of the Perl Monks concerning the following question:
Hello again Monks!
In addition to my long question below, I've come up with a shorter more theoretical question that should be easier to jump into.
Given a string $s, and a regex $r anchored at both front and end of the string with '^' and '$', is it possible to write a function could_match( $r, $s ) that will determine whether $r could match if additional characters were appended to $s?
One requirement: the entirety of $s must be matched, not a sub string of $s. $r could always match with additional characters if it was not anchored.
It seems this should be easy enough to answer if you can answer the following question: How much of string $s was consumed in the failed match attempt against $r? If the entire string was consumed and $r remained unmatched, then it would seem $r could match if characters were appended to $s. On the other hand, if only part of $s was consumed, then adding additional characters would not help and $r could not be matched.
Perhaps consumed is poor verbiage. You could also look at this as the index point of failure during the match attempt.
Examples:
# $s does not match, but '12' would be consumed in the # match attempt. In this case, all of $s is consumed # so $s should be able to match if additional characters # were added $s = '12' $r = qr/^123$/; could_match( $r, $s ) # returns TRUE # $s does not match and none of $s would be consumed in the # attempt as the '1' in the regex could not match. Since # $s was not consumed in it's entirety in the match attmempt # there are no characters that could be added to $s to allow # it to match $r $s = '23' $r = qr/^123$/; could_match( $r, $s ) # returns FALSE # $r in this case is not anchored, so it could always match # if more characters were added. However, since none of $s # was used in the match attempt, the entirety of $s could # not match even if more characters were added. $s = '123'; $r = qr/456/; could_match( $r, $s ) # returns FALSE
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Can I determine index point of failure in string during regex match attempt
by AnomalousMonk (Archbishop) on May 06, 2014 at 23:32 UTC | |
by tj_thompson (Monk) on May 06, 2014 at 23:52 UTC | |
by AnomalousMonk (Archbishop) on May 07, 2014 at 00:53 UTC | |
|
Re: Can I determine index point of failure in string during regex match attempt
by oiskuu (Hermit) on May 06, 2014 at 23:36 UTC |