baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi

This is a post regarding potential ideas so no code required. I am just trying to see if I forgot some obvious solution.

Problem:
Given two strings with the same prefix and the same length:

aaababbbababbbabababb aaababbbabaaababbabba
What would be the fastest way (the least number of computational steps) to identify the length of the shared prefix between two strings. An obvious solution is to just start pairwise matching of characters until a mismatch is located. But is there a way to preprocess this particular string in order to reduce the number of pairwise comparisons. Also given a large number of such cases what would be a better solution then to just pairwise compare strings? any suggestion is more than welcomed (code not required)
thnx
baxy

Replies are listed 'Best First'.
Re: String matching idea
by choroba (Cardinal) on Sep 18, 2015 at 12:06 UTC
    My idea: xor the strings, find the position of the first non-null character in the result.

    Update: no code was requested, so using the spoiler tag:

    If your strings only contain "a" and "b", you can use

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: String matching idea
by Tux (Canon) on Sep 18, 2015 at 12:24 UTC
      Note that the OP is asking for the common prefix, a much simpler problem than the LCSS one!

        I agree, but all those show a wealth of approaches. Finding the one that has the least OP's on the specific case of the OP on their system is left to the reader. In some cases some OP's might be faster on one architecture.


        Enjoy, Have FUN! H.Merijn
Re: String matching idea
by hippo (Archbishop) on Sep 18, 2015 at 12:11 UTC

    My first approach for 2 strings would be a binary search. If each string is of length L, start by comparing the substrings from 0 to L/2. Then, depending on whether they match, either compare the substrings to L/4 or 3L/4, etc. This should give the correct result in log_2(L) steps, but bear in mind that each step requires 2 substring ops which may not be particularly cheap.

    For the large number of pairs of strings consisting of only 2 chars "a" and "b" I'd think about assigning them into hash bins based on initial char sets. Each set you create should halve the number of other operations.

Re: String matching idea
by salva (Canon) on Sep 18, 2015 at 12:29 UTC
    For the 1-to-1 case, just do the obvious thing, there are no shortcuts for that problem.

    For the N-to-N case, use a trie or prefix tree.

      Please indulge my curiosity - what do you consider "the obvious thing" ?

        Oh, well, I meant doing the obvious thing in C (or any other low-level enough language): comparing the characters one by one sequentially until a divergence is found.

        If you limit the solution to Perl, as it doesn't provide some builtin that could do that, we go into the land of tricks, as the solution posted by choroba above.

Re: String matching idea
by Anonymous Monk on Sep 18, 2015 at 17:52 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1142400 use strict; use warnings; # two strings my $s1 = 'aaababbbababbbabababb'; my $s2 = 'aaababbbabaaababbabba'; ($s1 ^ $s2) =~ /\0*/; print $+[0], "\n"; # many strings (assumes no \n in strings) my @many = qw( aaababbbababbbabababb aaababbbabaaababbabba aaababababaaababbabba aaababbbabaaaaabbabba ); join("\n", @many, '') =~ /^(.*).*\n(?:\1.*\n)*$/; print length $1, "\n"; # also solves original problem :) @many = qw( aaababbbababbbabababb aaababbbabaaababbabba ); join("\n", @many, '') =~ /^(.*).*\n(?:\1.*\n)*$/; print length $1, "\n";

    As far as "fastest way", see Benchmark.pm :)