in reply to Re^5: Comparing 2 different-sized strings
in thread Comparing 2 different-sized strings

Hi, Thank you again for all your patience and helping me. Can you tell me why the map counter is from 0 to length(hay)-length(nee) and why not just from 0 to length(nee)?
  • Comment on Re^6: Comparing 2 different-sized strings

Replies are listed 'Best First'.
Re^7: Comparing 2 different-sized strings
by BrowserUk (Patriarch) on Aug 11, 2013 at 09:12 UTC
    why not just from 0 to length(nee)?

    Because if you compare at position lenght( hay), you aren't comparing anything.

    Take the case of a 20-byte haystack:acgtacgtacgtacgtacgt and a 4-byte needle: acct; at position 20:

    000000001111111111112 012345678901234567890 acgtacgtacgtacgtacgt acct

    The last position you can get a full match is at 20-4 position 16:

    000000001111111111112 012345678901234567890 acgtacgtacgtacgtacgt acct

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hi, Thank you so much for your help. Could you just tell me what the "for" is when you call the subroutine in the main program? I have seen "for" only in the context of a for loop where you also supply the 3 parameters like initial index, final, and increment. By the way, everything else you explained to me I completely understood and my script now works perfectly. Thank you so much!!
        Could you just tell me what the "for" is when you call the subroutine in the main program? I have seen "for" only in the context of a for loop where you also supply the 3 parameters like initial index, final, and increment.

        Sure.

        If there are multiple matches in the haystack, the subroutine will return a list of start positions, one for each match.

        By giving that list to for, it will execute the print substr statement for each position returned; with $_ taking on each of those start positions one after the other.

        Hence, this

        $hay = 'aacctgacctacgtttgacgatcgtacgtcagtcctccgtgctaactgacgtaaaaaaaata +cgtcccccccc'; $nee = 'acgtacgt'; print substr( $hay, $_-5, length( $nee ) + 10 ) for fuzzyMatch( \$hay, + \$nee, 3 );

        prints the 10 matches (+the 5 bytes before and after):

        acctgacctacgtttgac gacctacgtttgacgatc gtttgacgatcgtacgtc gacgatcgtacgtcagtc atcgtacgtcagtcctcc gtcagtcctccgtgctaa tgctaactgacgtaaaaa aactgacgtaaaaaaaat aaaaaaaatacgtccccc aaaatacgtcccccccc

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Ok, thank you. Now I understand a lot more about using bitwise approaches to Perl. I also just noticed your post from a while ago regarding Hamming Distance: my $s1 = 'AAAAA'; my $s2 = 'ATCAA'; my $s3 = 'AAAAA'; print "$s1:$s2 hd:", hd( $s1, $s2 ); # will give value 2 print "$s1:$s3 hd:", hd( $s1, $s3 ); # will give value 0 sub hd{ length( $_ 0 ) - ( ( $_ 0 ^ $_ 1 ) =~ tr\0\0 ) } I just didnt understand the line above defining the subroutine. How do you know which part refers to which sequence ($s1 vs $s2 for example)? Thank you so much! I can't believe how helpful and patient you are.