in reply to Measuring Substrings Problem

This produces the required results for the testcases provided reasonably efficiently. Generalising the implementation is left as an exercise.

Basically, you replace spaces with nulls, OR the n-1 shorter strings together and then XOR the result with the longest string. You then count the number of nulls in the result.

The definition of "shorter" and "longest in that description is fuzzy :).

#! perl -slw use strict; my $s1 = 'GATTACGAGTGGCGCTCGTGTAACGGCA'; my $s2 = 'GATTACG GCGCTCG AACGGCA'; my $masked = $s1 ^ $s2; print scalar $masked =~ tr[\0][0]; my $s3 = 'GATTACGAGTGGCGCTCGTGTAACGGCA'; my $s4 = 'GATTACG '; my $s5 = ' TTACGAG CGTGTAA '; tr[ ][\0] for $s3, $s4, $s5; $masked = $s3 ^ ( $s4 | $s5 ); print scalar $masked =~ tr[\0][0]; my $s6 = ' GCTCGTG '; my $s7 = 'GATTACGAGTGGCGCTCGTGTAACGGCA'; my $s8 = ' TACGAGT '; my $s9 = ' GTGGCGC '; tr[ ][\0] for $s6, $s7, $s8, $s9; $masked = $s7 ^ ( $s6 | $s8 | $s9 ); print scalar $masked =~ tr[\0][0]; __END__ P:\test>junk2 21 16 17

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?

Replies are listed 'Best First'.
Re^2: Substring Distance Problem
by Roy Johnson (Monsignor) on Apr 08, 2005 at 12:30 UTC
    I was going to include the generalized version of your solution with my vec solution, as they're functionally fairly similar, but I couldn't get it to work. Finally found my mistake, so here's a solution to the exercise:
    sub score { my ($str, $array) = @_; my $vec = "\0" x length($str); for (@$array) { my $idx = index $str, $_; # Matching substrings are padded into position with nulls $vec |= ("\0" x $idx) . $_; } # Matching characters become nulls; others non-nulls $vec ^= $str; # Count nulls $vec =~ tr/\0//; }
    Update:
    An interesting (possibly quite useful) thing for the OP to note is that the vec solution effectively builds the list of dots as its vector (ones in matched positions), and your solution gives one string with the actual matched characters in position.

    Caution: Contents may have been coded under pressure.
Re^2: Substring Distance Problem
by polettix (Vicar) on Apr 08, 2005 at 12:01 UTC
    In my understanding, ewijaya should automate the substring positioning as well as counting - what I think you have left as an exercise. This part would probably spoil your efficiency claim, because you're solving the problem from quite a convenient starting point :)

    This said, I find the final computation quite elegant and istructive - I wouldn't have thought of XORing letters even in 100 years. There's always to learn, luckly, provided I'll be able to remember it when I'll need :)

    Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

    Don't fool yourself.

      This said, I find the final computation quite elegant and istructive - I wouldn't have thought of XORing letters even in 100 years.

      BrowserUk is The XOR Meister.

      I had the same reaction as yours when I first encountered the "infamous xor trick" (in this case to find the first position at which two strings differ, or equivalently, the length of the longest common prefix):

      $a = "foobar"; $b = "foobAr"; ($a ^ $b) =~ /^(\0*)/ and print length $1; __END__ 4
      Way cool, though it works as written only if the characters are 1 byte long.

      There's always to learn, luckly, provided I'll be able to remember it when I'll need :)

      Yep, that's the rub.

      the lowliest monk