Longest common substring

To find the longest substring shared by two strings.

sub longest_common_substr {
  # provided you know there are no NULs
  my $str = join "\0", @_;
  my $len = 1;
  my $match;

  while ($str =~ m{ ([^\0]{$len,}) (?= [^\0]* \0 [^\0]*? \1 ) }xg) {
    $len = length($match = $1) + 1;
  }

  return $match;
}
[download]

Comment on Longest common substring Download Code

Replies are listed 'Best First'.
Re: Longest common substring by blakem (Monsignor) on Feb 16, 2002 at 00:29 UTC
At the risk of getting rebuked again while commenting on a japhy regex ;-P Overlapping matches seem to cause a problem... For example, the last four characters in 'abcabc' and 'caWcabc' match, yet the function only returns the last three. `print longest_common_substr('abcabc','caWcabc'); # 'abc' not 'cabc'` [download] I think it might be as simple as removing the /g (but thats the part I don't fully comprehend....) The code forces each match to be bigger than the previous one, with greedy matching helping us rachet up several steps at a time. The /g might be doing something else tricky, but I don't see it. Update: Other word pairs that fail similarly are sense/tense and onion/union. -Blake	[reply] [d/l]

Replies are listed 'Best First'.

Re: Longest common substring
by blakem (Monsignor) on Feb 16, 2002 at 00:29 UTC

getting rebuked again

japhy

Overlapping matches seem to cause a problem... For example, the last four characters in 'abcabc' and 'caWcabc' match, yet the function only returns the last three.

print longest_common_substr('abcabc','caWcabc');  # 'abc' not 'cabc'
[download]

Update: Other word pairs that fail similarly are sense/tense and onion/union.

-Blake

[reply]
[d/l]