in reply to Re^3: Comparing 2 different-sized strings
in thread Comparing 2 different-sized strings

Hi, Thank you so much and I'm so sorry to bother you one last time, but could you just explain what's going on inside the map function please? I'm new to perl and I'm trying to google all of the components of the script that I don't understand so I make sure that I understand what's going on at every line.
  • Comment on Re^4: Comparing 2 different-sized strings

Replies are listed 'Best First'.
Re^5: Comparing 2 different-sized strings
by BrowserUk (Patriarch) on Aug 10, 2013 at 21:27 UTC
    could you just explain what's going on inside the map function please?

    Sure.

    sub fuzzyMatch { my( $rHay, $rNee, $misses ) = @_; my $lNee = length $$rNee; my $min = $lNee - $misses; map { ( ( substr( $$rHay, $_, $lNee ) ^ $$rNee ) =~ tr[\0][] ) >= $min ? $_ : () } 0 .. length( $$rHay ) - $lNee; }
    • We need to compare the needle against the haystack at each position.

      Hence the map counter runs from 0 to length( haystack) - length( needle ).

    • We need to compare the same number of characters from the haystack as there are in the needle.

      Hence, the substr presents a needle length substring of haystack at each of those counter positions.

    • We don't just want a yes/no comparison; we need a count of the differences.

      So we bit-wise xor (^) the substring and the needle.

      The result is a string that has a 0 (null) byte wherever the two strings match; and some other byte value where they do not.

    • We need to count the zero bytes.

      tr[\0][] does that efficiently.

    • If the count of matched bytes is greater than the minimum required (length( needle ) - misses)

      return the position where the match occurred ($_), otherwise return nothing (()).

    Hope that clarifies things little. Continue to ask about anything that isn't clear.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hi, Thank you again for all your patience and helping me. Can you tell me why the map counter is from 0 to length(hay)-length(nee) and why not just from 0 to length(nee)?
        why not just from 0 to length(nee)?

        Because if you compare at position lenght( hay), you aren't comparing anything.

        Take the case of a 20-byte haystack:acgtacgtacgtacgtacgt and a 4-byte needle: acct; at position 20:

        000000001111111111112 012345678901234567890 acgtacgtacgtacgtacgt acct

        The last position you can get a full match is at 20-4 position 16:

        000000001111111111112 012345678901234567890 acgtacgtacgtacgtacgt acct

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.