Print left-anchored similarities between two strings

miketosh has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Print left-anchored similarities between two strings by Corion (Patriarch) on Nov 10, 2009 at 15:39 UTC
String-`xor` (well, rather `^`) to the rescue: `my @strings = qw( /home/usernames/doejohnwilson /home/usernames/doejanemary ); my $common = $strings[0] ^ $strings[1]; # every \x00 means the chars a +re identical my $common_length = 0; if ($common =~ /^(\x00+)/) { $common_length = length $1; }; print substr $strings[0], 0, $common_length;` [download]	[reply] [d/l] [select]
Re^2: Print left-anchored similarities between two strings by Anonymous Monk on Nov 10, 2009 at 17:22 UTC
You can save a couple of steps: `my @strings = qw( /home/usernames/doejohnwilson x/home/usernames/doejanemary ); my $common_length = ( $strings[0] ^ $strings[1] ) =~ /^\0*/ && $+[0]; print substr $strings[0], 0, $common_length;` [download]	[reply] [d/l]
Re^3: Print left-anchored similarities between two strings by miketosh (Acolyte) on Nov 11, 2009 at 20:27 UTC
Awesome! I think this will work for me (since I can have an unspecified number of elements in an array...) `# our $input is user inputted string # and I only want directory/ names our @matches = map { -d $_? $_ .="/" : "" } <$input>; expand_match \|\| print "\a"; sub expand_match { return $input = $matches[0] if(@matches == 1); # Get the longest matching string from all elements my $match = $matches[0]; for (@matches) { $match = substr $_, 0, ( $match ^ $_ ) =~ /^\0/ && $+[0]; } return $input = $match if length($match) > length($input); return 0; }` [download]	[reply] [d/l]
Re^2: Print left-anchored similarities between two strings by Anonymous Monk on Nov 11, 2009 at 00:27 UTC
perldoc -f rindex `my @strings = qw( /home/usernames/doejohnwilson /home/usernames/doejanemary ); my $common = $strings[0] ^ $strings[1]; print substr $strings[0], 0, rindex($common,"\x00"); __END__` [download]	[reply] [d/l]
Re^3: Print left-anchored similarities between two strings by BrowserUk (Patriarch) on Nov 11, 2009 at 00:51 UTC
Careful. There can be matching characters after the first non-matching character! Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP PCW It is as I've been saying!(Audio until 20090817)	[reply]
Re: Print left-anchored similarities between two strings by almut (Canon) on Nov 10, 2009 at 17:22 UTC
The "standard" way to compute the longest left-anchored common substring would be to compare character for character of both strings until they differ. However, in Perl it's relatively cumbersome (compared to C for example) to access a single character in a string (via the `substr` builtin), so this approach is quite slow (even for rather short strings like your sample). For comparison, here's the results of a quick benchmark, so you'll actually appreciate the performance of the XOR method suggested by Corion: #!/usr/bin/perl use Benchmark qw(cmpthese); use Inline C; my $s1 = '/home/usernames/doejohnwilson'; my $s2 = '/home/usernames/doejanemary'; cmpthese(-1, { 'standard' => sub { my $pos = 0; $pos++ while substr($s1, $pos, 1) eq substr($s2, $pos, 1); my $result = substr($s1, 0, $pos); # die unless $result eq "/home/usernames/doej"; }, 'fancy-XOR' => sub { my $common = $s1 ^ $s2; my $common_length = 0; if ($common =~ /^(\x00+)/) { $common_length = length $1; }; my $result = substr($s1, 0, $common_length); # die unless $result eq "/home/usernames/doej"; }, 'standard-XS' => sub { my $common_length = left_anchored_common_substr_len($s1, $s2); my $result = substr($s1, 0, $common_length); # die unless $result eq "/home/usernames/doej"; } }); __END__ __C__ int left_anchored_common_substr_len(char* s1, char* s2) { int c = 0; while (s1++ == s2++) { c++; } return c; } [download] Results: `Rate standard fancy-XOR standard-XS standard 109227/s -- -77% -92% fancy-XOR 468114/s 329% -- -64% standard-XS 1291788/s 1083% 176% --` [download] Update: added an XS routine (via Inline::C) that implements the aforementioned "standard" approach in C.	[reply] [d/l] [select]