my @spectrum = qw/174.0 102.99 71.62 70.53 82.49 69.05 18.94 102.99 71 +.62 70.53 82.49 69.05 18.94 102.99 71.62 70.53 82.49 69.05 18.94 102. +99 71.62 70.53 82.49 69.05 18.94/; my @subsp = qw/102.0 18.94 71.61 82.0 173.0 68.7/;
For practical reasons, the bigger is called spectrum, and the smaller is called subspectrum. Every number in the arrays is called signal.
The difference between spectrum and subspectrum is calculated as sum of absolute values of differences between the signals in subspectrum and the closest signals in big spectrum. The 'similarity' is simply the number of signals in subspectrum divided by the difference. If the difference is 0 (impossible in most cases), the similarity is considered 1000.
I need to write a function which calculates the similarity between given spectrum and subspectrum and returns it with set of 'best' signals from big spectrum. For the given spectrum and subspectrum the answer is
REQUEST = 18.94 68.7 71.61 82.0 102.0 173.0 BEST_SUBSP = 18.94 69.05 71.62 82.49 102.99 174.0 SIMILARITY = 2.11
The main problem about the function is that it needs to be really fast, as it is used very intensively.
The best way i have found as yet is implementing the alpha-beta pruning search algorithm. The parameter $limit accepted by the function appears from practice too - it limites maximum the difference between a pair of signals in spectrum and subspectrum. If the difference exceeds the limit, the similarity is considered to be 0.
#!/usr/bin/perl use warnings; use strict; sub similarity { my ($request, $spectrum, $accuracy, $limit) = @_[0..3]; my $fmt = "%.${accuracy}f"; my @spectrum = @$spectrum; my @request = sort {$a <=> $b} @$request; return 0, [] unless @request; return 0, [] if @spectrum < @request; my %good; foreach my $signal (@request) { my @good; return 0 unless (@good = grep { abs($_-$signal) <= $limit } @s +pectrum); %good = (%good, map { ($_ => 1) } @good); } @spectrum = sort {$a <=> $b} grep { $good{$_} } @spectrum; my $alpha; my $best_subsp; my $alphabeta; $alphabeta = sub { my ($beta, $spectrum, $subsp, $found) = @_; unless (@$subsp) { $alpha = $beta; $best_subsp = $found; return; } my $compared = $subsp->[0]; foreach my $sign_id (0..$#{$spectrum}) { my $diff = abs($spectrum->[$sign_id] - $compared); next if $diff > $limit; next if defined($alpha) && ($diff + $beta) >= $alpha; $alphabeta->($diff + $beta, [@{$spectrum}[0..$sign_id-1, $sign_id+1..$#{$spectrum} +]], [@{$subsp}[1..$#{$subsp}]], [@$found, $spectrum->[$sign_id]], ); } return; }; $alphabeta->(0, \@spectrum, \@request, []); return 0, [] unless defined $alpha; my $max_similarity = sprintf $fmt, $alpha ? @request/$alpha : 1000 +; return $max_similarity, $best_subsp; } my $limit = 3; my @spectrum = qw/174.0 102.99 71.62 70.53 82.49 69.05 18.94 102.99 71 +.62 70.53 82.49 69.05 18.94 102.99 71.62 70.53 82.49 69.05 18.94 102. +99 71.62 70.53 82.49 69.05 18.94/; my @subsp = qw/102.0 18.94 71.61 82.0 173.0 68.7/; my ($sim, $best) = similarity(\@subsp, \@spectrum, 2, $limit); @subsp = sort {$a <=> $b} @subsp; print <<DEBUG; REQUEST = @subsp BEST_SUBSP = @{$best} SIMILARITY = $sim DEBUG
The question is: does anybody know a better (eq 'faster') solution for this case?
In reply to Calculate the similarity of two arrays of numbers by Ieronim
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |