Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl monks,

Since the KMP algorithm can determine if string S2 is a substring of S1 in O(n) time, can this algorithm be modified to test whether S2 is a subsequence of S1 in linear time?
Thanks.
  • Comment on Can KMP be used to find the longest common subsequence?

Replies are listed 'Best First'.
Re: Can KMP be used to find the longest common subsequence? (no?)
by tye (Sage) on Dec 11, 2007 at 08:36 UTC

    Well, no, not really. Not that it matters, since the naive "obvious" approach is already O(n).

    - tye        

Re: Can KMP be used to find the longest common subsequence?
by Thilosophy (Curate) on Dec 12, 2007 at 07:34 UTC
    Some background context:

    Wikipedia has an explanation of the Knuth-Morris-Pratt algorithm for substring searches.

    The difference between a substring and a subsequence is that a subsequence is not necessarily contiguous (all the characters must be present, and in the correct order, but there can be unrelated characters in between).

      The difference between a substring and a subsequence is that a subsequence is not necessarily contiguous (all the characters must be present, and in the correct order, but there can be unrelated characters in between).
      In that case, I think the easiest way to search for subsequences in Perl is to use regular expressions. For example, if the subsequence is "abcd", then the regular expression to look for a subsequence is /a.*?b.*?c.*?d/.

      And since you can easily contruct a regex out of a string, that is easy toconstruct:

      $string = 'abcd'; $regex = join '.*?', split //, $string; $sequence = 'xxxxxaxxxxxxbxxxxxcxxxxdxxxxx'; if($sequence =~ /$regex/s) { printf "I found a match, at offset %d and with total length %d\n", + $-[0], $+[0]-$-[0]; }
      Result:
      I found a match, at offset 5 and with total length 19

        If that were to fail to match, then it would fail to match very badly, as in O(N**K) badly. Better:

        /^[^a]*?a[^b]*?b[^c]*?c[^d]*?d/

        or use features that I don't use and so don't remember to prevent backtracking. The above version should proceed to match the string linearly and in O(n) and, if that fails, backtrack linearly in O(n) and quickly fail.

        - tye