in reply to Can I determine index point of failure in string during regex match attempt

If  $r is really just a plain, simple string of characters, it seems to me you're asking questions that can easily be answered by index. If  $r is a regex of limitless complexity, it seems much more difficult, perhaps impossible, to answer the question "could there be a match if more characters were added to the right end of $s?"

c:\@Work\Perl>perl -wMstrict -le "my $r = '123'; ;; for my $s ('', qw(1 12 123 1234 2 23 234)) { my $left_same = 0 == index($r, $s); my $all_same = $r eq $s; printf qq{%-6s with more stuff could %smatch with '$r' \n}, qq{'$s'}, ($left_same and not $all_same) ? '' : 'NOT ' ; } " '' with more stuff could match with '123' '1' with more stuff could match with '123' '12' with more stuff could match with '123' '123' with more stuff could NOT match with '123' '1234' with more stuff could NOT match with '123' '2' with more stuff could NOT match with '123' '23' with more stuff could NOT match with '123' '234' with more stuff could NOT match with '123'
Updates:
  1. Maybe the idea of Levenshtein distance (LD) could also be brought to bear. I.e., if the two strings (again, I'm assuming they're both just simple strings) are the same at the left, the LD gives an idea of how much one must change the shorter string to make it the same as the longer. (LD == 0 means both strings were exactly the same to begin with.)
  2. OTOH, If you just want to answer the question "can  $s with one or more characters added and anchored at the start of  $r match?" (again assuming  $s $r to be plain strings), the regex
        $r =~ m{ \A \Q$s\E .+ }xms
    would seem to do the trick (note reversal of $s and $r).

Replies are listed 'Best First'.
Re^2: Can I determine index point of failure in string during regex match attempt
by tj_thompson (Monk) on May 06, 2014 at 23:52 UTC

    For the case of a simple string of characters, that would be an easy way to handle it. And as Oiskuu mentioned below, full regex capability probably does essentially make this a variant of the Halting problem.

    So perhaps a simpler question. Can you get the index of the last successful matching character for a failed regex? Judging from what I've seen, I don't think so, but I'm often wrong :)

      My limited understanding of regex behavior is that the RE tries to match the regex  'aaaa' =~ /b/ everywhere, so the last 'failed' match position would always be at the end of the  'aaaa' string. (I believe that on-going regex optimization efforts have led to an RE that will abandon matching very quickly, perhaps not even start, if faced with such a simple regex as the one in the example. But as the regex begins to be even a little more complex, attempts at such optimizations are soon frustrated.)

      In the case of an anchored or unanchored regex, the RE must, of course, 'know' in some sense when and where matches fail, and where the last attempt ends. But since the RE is only concerned with reporting successful matches and not unsuccessful ones, of which there may be many and many, this information is not, as far as I am aware, preserved.

      In any event, it still seems to me that questions like "what is the last offset at which 'efg' matches in 'abcdefghi'?" or "what is the last offset at which 'abc' anchored at the start of 'abcdef' matches?" can easily be answered by index.