Re: Substring Distance Problem

This produces the required results for the testcases provided reasonably efficiently. Generalising the implementation is left as an exercise.

Basically, you replace spaces with nulls, OR the n-1 shorter strings together and then XOR the result with the longest string. You then count the number of nulls in the result.

The definition of "shorter" and "longest in that description is fuzzy :).

#! perl -slw
use strict;

my $s1 = 'GATTACGAGTGGCGCTCGTGTAACGGCA';
my $s2 = 'GATTACG    GCGCTCG   AACGGCA';
my $masked = $s1 ^ $s2;
print scalar $masked =~ tr[\0][0];

my $s3 = 'GATTACGAGTGGCGCTCGTGTAACGGCA';
my $s4 = 'GATTACG                     ';
my $s5 = '  TTACGAG       CGTGTAA     ';
tr[ ][\0] for $s3, $s4, $s5;

$masked = $s3 ^ ( $s4 | $s5 );
print scalar $masked =~ tr[\0][0];

my $s6 = '             GCTCGTG        ';
my $s7 = 'GATTACGAGTGGCGCTCGTGTAACGGCA';
my $s8 = '   TACGAGT                  ';
my $s9 = '        GTGGCGC             ';

tr[ ][\0] for $s6, $s7, $s8, $s9;

$masked = $s7 ^ ( $s6 | $s8 | $s9 );
print scalar $masked =~ tr[\0][0];

__END__
P:\test>junk2
21
16
17
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco.

Rule 1 has a caveat! -- Who broke the cabal?

Comment on Re: Substring Distance Problem Download Code

Replies are listed 'Best First'.
Re^2: Substring Distance Problem by Roy Johnson (Monsignor) on Apr 08, 2005 at 12:30 UTC
I was going to include the generalized version of your solution with my vec solution, as they're functionally fairly similar, but I couldn't get it to work. Finally found my mistake, so here's a solution to the exercise: `sub score { my ($str, $array) = @_; my $vec = "\0" x length($str); for (@$array) { my $idx = index $str, $_; # Matching substrings are padded into position with nulls $vec \|= ("\0" x $idx) . $_; } # Matching characters become nulls; others non-nulls $vec ^= $str; # Count nulls $vec =~ tr/\0//; }` [download] Update: An interesting (possibly quite useful) thing for the OP to note is that the vec solution effectively builds the list of dots as its vector (ones in matched positions), and your solution gives one string with the actual matched characters in position. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^2: Substring Distance Problem by polettix (Vicar) on Apr 08, 2005 at 12:01 UTC
In my understanding, ewijaya should automate the substring positioning as well as counting - what I think you have left as an exercise. This part would probably spoil your efficiency claim, because you're solving the problem from quite a convenient starting point :) This said, I find the final computation quite elegant and istructive - I wouldn't have thought of XORing letters even in 100 years. There's always to learn, luckly, provided I'll be able to remember it when I'll need :) Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))") Don't fool yourself.	[reply]
Re^3: Substring Distance Problem by tlm (Prior) on Apr 08, 2005 at 14:39 UTC
This said, I find the final computation quite elegant and istructive - I wouldn't have thought of XORing letters even in 100 years. BrowserUk is The XOR Meister. I had the same reaction as yours when I first encountered the "infamous xor trick" (in this case to find the first position at which two strings differ, or equivalently, the length of the longest common prefix): `$a = "foobar"; $b = "foobAr"; ($a ^ $b) =~ /^(\0)/ and print length $1; __END__ 4` [download] Way cool, though it works as written only if the characters are 1 byte long. There's always to learn, luckly, provided I'll be able to remember it when I'll need :)* Yep, that's the rub. the lowliest monk	[reply] [d/l]