I have two arrays of rational numbers, e.g.
my @spectrum = qw/174.0 102.99 71.62 70.53 82.49 69.05 18.94 102.99 71 +.62 70.53 82.49 69.05 18.94 102.99 71.62 70.53 82.49 69.05 18.94 102. +99 71.62 70.53 82.49 69.05 18.94/; my @subsp = qw/102.0 18.94 71.61 82.0 173.0 68.7/;

For practical reasons, the bigger is called spectrum, and the smaller is called subspectrum. Every number in the arrays is called signal.

The difference between spectrum and subspectrum is calculated as sum of absolute values of differences between the signals in subspectrum and the closest signals in big spectrum. The 'similarity' is simply the number of signals in subspectrum divided by the difference. If the difference is 0 (impossible in most cases), the similarity is considered 1000.

I need to write a function which calculates the similarity between given spectrum and subspectrum and returns it with set of 'best' signals from big spectrum. For the given spectrum and subspectrum the answer is

REQUEST = 18.94 68.7 71.61 82.0 102.0 173.0 BEST_SUBSP = 18.94 69.05 71.62 82.49 102.99 174.0 SIMILARITY = 2.11

The main problem about the function is that it needs to be really fast, as it is used very intensively.

The best way i have found as yet is implementing the alpha-beta pruning search algorithm. The parameter $limit accepted by the function appears from practice too - it limites maximum the difference between a pair of signals in spectrum and subspectrum. If the difference exceeds the limit, the similarity is considered to be 0.

#!/usr/bin/perl use warnings; use strict; sub similarity { my ($request, $spectrum, $accuracy, $limit) = @_[0..3]; my $fmt = "%.${accuracy}f"; my @spectrum = @$spectrum; my @request = sort {$a <=> $b} @$request; return 0, [] unless @request; return 0, [] if @spectrum < @request; my %good; foreach my $signal (@request) { my @good; return 0 unless (@good = grep { abs($_-$signal) <= $limit } @s +pectrum); %good = (%good, map { ($_ => 1) } @good); } @spectrum = sort {$a <=> $b} grep { $good{$_} } @spectrum; my $alpha; my $best_subsp; my $alphabeta; $alphabeta = sub { my ($beta, $spectrum, $subsp, $found) = @_; unless (@$subsp) { $alpha = $beta; $best_subsp = $found; return; } my $compared = $subsp->[0]; foreach my $sign_id (0..$#{$spectrum}) { my $diff = abs($spectrum->[$sign_id] - $compared); next if $diff > $limit; next if defined($alpha) && ($diff + $beta) >= $alpha; $alphabeta->($diff + $beta, [@{$spectrum}[0..$sign_id-1, $sign_id+1..$#{$spectrum} +]], [@{$subsp}[1..$#{$subsp}]], [@$found, $spectrum->[$sign_id]], ); } return; }; $alphabeta->(0, \@spectrum, \@request, []); return 0, [] unless defined $alpha; my $max_similarity = sprintf $fmt, $alpha ? @request/$alpha : 1000 +; return $max_similarity, $best_subsp; } my $limit = 3; my @spectrum = qw/174.0 102.99 71.62 70.53 82.49 69.05 18.94 102.99 71 +.62 70.53 82.49 69.05 18.94 102.99 71.62 70.53 82.49 69.05 18.94 102. +99 71.62 70.53 82.49 69.05 18.94/; my @subsp = qw/102.0 18.94 71.61 82.0 173.0 68.7/; my ($sim, $best) = similarity(\@subsp, \@spectrum, 2, $limit); @subsp = sort {$a <=> $b} @subsp; print <<DEBUG; REQUEST = @subsp BEST_SUBSP = @{$best} SIMILARITY = $sim DEBUG

The question is: does anybody know a better (eq 'faster') solution for this case?


     s;;Just-me-not-h-Ni-m-P-Ni-lm-I-ar-O-Ni;;tr?IerONim-?HAcker ?d;print

In reply to Calculate the similarity of two arrays of numbers by Ieronim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.