in reply to Re: The sum of absolute differences in the counts of chars in two strings.
in thread The sum of absolute differences in the counts of chars in two strings.

The strings could contain anything and be of any length, but the example I'm working with is genomic data and less than 100 chars.

A worked example:

aaaaacaacaaagcc :: a=>10 c=>4 g=>1 t=>0 acaggtgacaaaaaa :: a=>9 c=>2 g=>3 t=>1 absolute diffs :: 1 2 2 1 sum of diffs :: 6

It would be wrong to assume an alphabet of 4 even for genomic data.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^2: The sum of absolute differences in the counts of chars in two strings.
  • Download Code

Replies are listed 'Best First'.
Re^3: The sum of absolute differences in the counts of chars in two strings.
by Limbic~Region (Chancellor) on Nov 20, 2011 at 01:23 UTC
    BrowserUk,
    I don't understand the example. Should that t => 0 and t => 1 be 1 not 2?

    Update: I am not going to have a chance to play with any of my ideas so I will just share them here in case they are of any help. I was hoping that there would be a way to "cancel out terms" such that there was less work to do. The two ideas I had for that would be performing bitwise operations on the strings to find out which characters were in common and only counting the remaining ones. The second idea I had would be to process the string in chunks rather than characters. If I were trying to do a generic solution though - I would like go with Inline::C (array vs hash) incrementing values for the first string, decrementing values for the second string and summing the 256 indices for the result in the end.

    Cheers - L~R

      Should that t => 0 and t => 1 be 1 not 2?

      Yes. Now corrected.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: The sum of absolute differences in the counts of chars in two strings.
by remiah (Hermit) on Nov 20, 2011 at 00:35 UTC
    How about count others (not agct) as an exception?

    my $aa="AGCTAAABBBCCC"; my %k=(a=>"a",g=>"g",c=>"c",t=>"t",else=>"[^agct]"); my %all; while( my($k,$pattern)=each %k ){ $all{$k}++ while ($aa =~ m/$pattern/gi); }