Re: Challenge: Fast Common Substrings

Here's just a naive solution using regexes. I have no idea if it's fast. On one hand, the regex engine is doing all the work; on the other hand, in the second code snippet, it's just brute-forcing the problem. The match_all_ways sub is taken from Re^3: Delimited Backtracking with Regex.

{
  my @matches;
  my $push = qr/(?{ push @matches, $1 })/;

  sub match_all_ways {
      my ($string, $regex) = @_;
      @matches = ();
      $string =~ m/($regex)$push(?!)/;
      return @matches;
  }
}

sub common_substr {
    my ($str1, $str2, $len) = @_;

    my %substr = map { $_ => 1 } match_all_ways($str1 => qr/.{$len}/);
    $substr{$_} |= 2 for match_all_ways($str2 => qr/.{$len}/);

    return grep { $substr{$_} == 3 } keys %substr;

}

print "$_\n" for common_substr("ABCDEF", "ABDEFCBDEAB", 2);
__END__
DE
EF
AB
[download]

Returns all the common substrings, since it might as well.. In scalar context returns the number.

Update: Or, if you want the regex engine to do all of the work, instead of computing the intersection of the two lists in perl:

{
  my @matches;
  my $push = qr/(?{ push @matches, $1 })/;

  sub match_all_ways {
      my ($string, $regex) = @_;
      @matches = ();
      $string =~ m/$regex$push(?!)/;
      return @matches;
  }
}

sub common_substr {
    my ($str1, $str2, $len) = @_;
    my %subs;
    @subs{  match_all_ways("$str1\0$str2" => qr/(.{$len}).*\0.*\1/)  }
+ = ();
    return keys %subs;
}
[download]

Note that the match_all_ways sub was changed slightly (to account for the different capturing). Disclaimer: If input strings contain a newline or null character, or if $len > 65536, it doesn't work.. but I think you get the idea.

These solutions are both trivial to extend to a range of lengths.. just pass "n,m" as the $len argument.

blokhead

Comment on Re: Challenge: Fast Common Substrings Select or Download Code