in reply to Comparing 2 different-sized strings

This will knock the spots of most every other algorithm implemented in perl and many of them when implemented in C:

#! perl -slw use strict; sub fuzzyMatch { my( $rHay, $rNee, $misses ) = @_; my $lNee = length $$rNee; my $min = $lNee - $misses; map { ( ( substr( $$rHay, $_, $lNee ) ^ $$rNee ) =~ tr[\0][] ) >= $min ? $_ : () } 0 .. length( $$rHay ) - $lNee; } my $hay = 'TCGAGTGGCCATGAACGTGCCAATTG'; my $nee = 'ATGATCCTG'; print substr( $hay, $_-5, length( $nee ) + 10 ) for fuzzyMatch( \$hay, + \$nee, 3 ); $hay = 'aacctgacctacgtttgacgatcgtacgtcagtcctccgtgctaactgacgtaaaaaaaata +cgtcccccccc'; $nee = 'acgtacgt'; print substr( $hay, $_-5, length( $nee ) + 10 ) for fuzzyMatch( \$hay, + \$nee, 3 ); __END__ C:\test>1048594 TGGCCATGAACGTGCCAAT acctgacctacgtttgac gacctacgtttgacgatc gtttgacgatcgtacgtc gacgatcgtacgtcagtc atcgtacgtcagtcctcc gtcagtcctccgtgctaa tgctaactgacgtaaaaa aactgacgtaaaaaaaat aaaaaaaatacgtccccc aaaatacgtcccccccc

The subroutine returns the offset where the fuzzily matched substrings are found in the primary; one for each match.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Comparing 2 different-sized strings
by AdrianJ217 (Novice) on Aug 09, 2013 at 09:22 UTC
    Hi, thank you so much for the help. Can you just explain to me what the double dollar sign in front of rNee means? Thank you.
      Can you just explain to me what the double dollar sign in front of rNee means?

      It means dereference the reference.

      Because genomic work often involves very large strings; and passing large strings into subroutines causes them to be copied:

      sub something { my( $string ) = @_; ## $string is a copy of the argument } my $hugeString = ........; something( $hugeString );

      Instead of passing the arguments directly, I pass references (kind of pointers) to them:

      fuzzyMatch( \$hay, \$nee, 3 ); ## pass references to needle and haysta +ck

      Within fuzzyMatch(), it receives references to the two strings:

      sub fuzzyMatch { my( $rHay, $rNee, $misses ) = @_; ## the 'r's are to remind that +these are references

      So to get to the actual strings, I use a second $

      my $lNee = length $$rNee; ## read as: $lenghtNeedle = length of t +he data $, referenced by $rNee

      So, $$rNee is shorthand for ${ $rNee }; if that clarifies things for you?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Hi, thank you so much. That makes sense, just the only question I had was when you put the r's thats to remind you they are references, but where do you actually declare them as references, using the slash operator? Thank you!
        Hi, Thank you so much and I'm so sorry to bother you one last time, but could you just explain what's going on inside the map function please? I'm new to perl and I'm trying to google all of the components of the script that I don't understand so I make sure that I understand what's going on at every line.
Re^2: Comparing 2 different-sized strings
by AdrianJ217 (Novice) on Aug 14, 2013 at 10:08 UTC
    Hi, Thank you so much for your help. So when I execute the script I noticed that if I want 3 mismatches and set $misses to 3, I also get the ones that have 2 mismatches also which makes sense sense inside the subroutine it asks for >= $min. However, if I want only 3 mismatches and NOT to include the ones with 2, I tried changing it to =$min without the greater than sign, but then it gave me an error message:
    can't modify bitwise xor (^) in list assignment at rRNA_target.pl line + 92, near ") }"
    Any ideas what I can do?
      I tried changing it to =$min without the greater than sign, but then it gave me an error message:

      A single = is assignment not comparison. You need to change >= to ==; it should then work as you require.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.