in reply to Re: Checking whether a string is a prefix of any string matching a given regular expression
in thread Checking whether a string is a prefix of any string matching a given regular expression

You say that the question of whether the end of the input string was reached cannot be answered because “the Perl regular expression engine never asks it.” I believe that it, in fact, does. It obviously tracks along both the pattern and the input string. How does it matter which is the “road map” and which is the “road”?

Your example of trying to match the string "abc" against the pattern /^abc(?<=d)/ is interesting. First of all, for the purpose of my original question, I don't consider this matching process to “reach the end of the input string.” I should have made this clear.

For the engine to “reach the end” in the sense I intended, it would have to consume all input characters and then try to consume another one. That is, it has to care about a potential suffix. Maybe I should have said “bump into the end of the input” rather than the less evocative “reach the end of the input” (although I do believe I used the verb “hit” once).

In your example, the engine would give up after consuming all three characters in the input string and seeing that the lookbehind fails. It would not try to consume another character and thus not “bump into” the end of the input string. So using the conjectured implication I described in my original post (i.e., string does not match and engine never bumped into the end ⇒ string is not a prefix of any matching string), I would predict that the input string is not the prefix of any matching string, and I would be correct—because, as you said, in this case there are no matching strings!

I am not familiar with the DFA and NFA techniques and their relative merits, but frankly, I don’t see how those details are relevant. You say that Perl does not have the information necessary to answer my question, which makes me wonder what you think that information is. (Surely it must know at any given time which part of the input string it is working with, and whether it ever tried to read past the end of the input.)

My reasoning is as follows: If the engine decides not to look at some suffix of the input, that can only mean that the match attempt has already failed and that the engine will then start backtracking. Hence, if there is some character in the input string that the engine never looks at, then the string certainly does not match (but not the other way around—when the engine looks at every character, the match can clearly either succeed or fail). From this it follows that if (a) the string does not match, and (b) the engine never tries to look past the end of the string, then adding characters to the end of the string cannot make it start matching (because the engine does not even look there).

Does that make sense to you? Please let me know if you think I’m being unclear on some point.

  • Comment on Re^2: Checking whether a string is a prefix of any string matching a given regular expression
  • Select or Download Code