You say that the question of whether the end of the input string was reached cannot be answered because “the Perl regular expression engine never asks it.” I believe that it, in fact, does. It obviously tracks along both the pattern and the input string. How does it matter which is the “road map” and which is the “road”?

Your example of trying to match the string "abc" against the pattern /^abc(?<=d)/ is interesting. First of all, for the purpose of my original question, I don't consider this matching process to “reach the end of the input string.” I should have made this clear.

For the engine to “reach the end” in the sense I intended, it would have to consume all input characters and then try to consume another one. That is, it has to care about a potential suffix. Maybe I should have said “bump into the end of the input” rather than the less evocative “reach the end of the input” (although I do believe I used the verb “hit” once).

In your example, the engine would give up after consuming all three characters in the input string and seeing that the lookbehind fails. It would not try to consume another character and thus not “bump into” the end of the input string. So using the conjectured implication I described in my original post (i.e., string does not match and engine never bumped into the end ⇒ string is not a prefix of any matching string), I would predict that the input string is not the prefix of any matching string, and I would be correct—because, as you said, in this case there are no matching strings!

I am not familiar with the DFA and NFA techniques and their relative merits, but frankly, I don’t see how those details are relevant. You say that Perl does not have the information necessary to answer my question, which makes me wonder what you think that information is. (Surely it must know at any given time which part of the input string it is working with, and whether it ever tried to read past the end of the input.)

My reasoning is as follows: If the engine decides not to look at some suffix of the input, that can only mean that the match attempt has already failed and that the engine will then start backtracking. Hence, if there is some character in the input string that the engine never looks at, then the string certainly does not match (but not the other way around—when the engine looks at every character, the match can clearly either succeed or fail). From this it follows that if (a) the string does not match, and (b) the engine never tries to look past the end of the string, then adding characters to the end of the string cannot make it start matching (because the engine does not even look there).

Does that make sense to you? Please let me know if you think I’m being unclear on some point.


In reply to Re^2: Checking whether a string is a prefix of any string matching a given regular expression by dbrockman
in thread Checking whether a string is a prefix of any string matching a given regular expression by dbrockman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.