in reply to Re^2: Comparing 2 different-sized strings
in thread Comparing 2 different-sized strings

Can you just explain to me what the double dollar sign in front of rNee means?

It means dereference the reference.

Because genomic work often involves very large strings; and passing large strings into subroutines causes them to be copied:

sub something { my( $string ) = @_; ## $string is a copy of the argument } my $hugeString = ........; something( $hugeString );

Instead of passing the arguments directly, I pass references (kind of pointers) to them:

fuzzyMatch( \$hay, \$nee, 3 ); ## pass references to needle and haysta +ck

Within fuzzyMatch(), it receives references to the two strings:

sub fuzzyMatch { my( $rHay, $rNee, $misses ) = @_; ## the 'r's are to remind that +these are references

So to get to the actual strings, I use a second $

my $lNee = length $$rNee; ## read as: $lenghtNeedle = length of t +he data $, referenced by $rNee

So, $$rNee is shorthand for ${ $rNee }; if that clarifies things for you?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Comparing 2 different-sized strings
by AdrianJ217 (Novice) on Aug 09, 2013 at 11:46 UTC
    Hi, thank you so much. That makes sense, just the only question I had was when you put the r's thats to remind you they are references, but where do you actually declare them as references, using the slash operator? Thank you!
      but where do you actually declare them as references, using the slash operator?

      You don't "declare" references -- they are just scalars with 'special content' -- you 'take references' when you need them.

      In the case of the code, the references are taken when the subroutine is called:

      ... for fuzzyMatch( \$hay, \$nee, 3 ); #....................^......^

      Ie. $hay & $nee are normal strings in the main program.

      When I call fuzzyMatch( \ $hay, \ $nee, 3 ), I am taking references to those two strings (using '\') and passing them into the subroutine.

      In the subroutine those references get assigned to the local variables: $rHay & $rNee respectively:

      sub fuzzyMatch { my( $rHay, $rNee, $misses ) = @_; ...

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Comparing 2 different-sized strings
by AdrianJ217 (Novice) on Aug 10, 2013 at 19:27 UTC
    Hi, Thank you so much and I'm so sorry to bother you one last time, but could you just explain what's going on inside the map function please? I'm new to perl and I'm trying to google all of the components of the script that I don't understand so I make sure that I understand what's going on at every line.
      could you just explain what's going on inside the map function please?

      Sure.

      sub fuzzyMatch { my( $rHay, $rNee, $misses ) = @_; my $lNee = length $$rNee; my $min = $lNee - $misses; map { ( ( substr( $$rHay, $_, $lNee ) ^ $$rNee ) =~ tr[\0][] ) >= $min ? $_ : () } 0 .. length( $$rHay ) - $lNee; }
      • We need to compare the needle against the haystack at each position.

        Hence the map counter runs from 0 to length( haystack) - length( needle ).

      • We need to compare the same number of characters from the haystack as there are in the needle.

        Hence, the substr presents a needle length substring of haystack at each of those counter positions.

      • We don't just want a yes/no comparison; we need a count of the differences.

        So we bit-wise xor (^) the substring and the needle.

        The result is a string that has a 0 (null) byte wherever the two strings match; and some other byte value where they do not.

      • We need to count the zero bytes.

        tr[\0][] does that efficiently.

      • If the count of matched bytes is greater than the minimum required (length( needle ) - misses)

        return the position where the match occurred ($_), otherwise return nothing (()).

      Hope that clarifies things little. Continue to ask about anything that isn't clear.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Hi, Thank you again for all your patience and helping me. Can you tell me why the map counter is from 0 to length(hay)-length(nee) and why not just from 0 to length(nee)?