Dear Monks,

I have to write a small application that visualizes 2D datasets, finds local maxima and minima in them and lets the user manage those maxima and minima interactively. The interactive GUI part comes later, this question concerns the maxima/minima finding algorithm itself.

Certain assumptions can be made about the data: it is somewhat periodic, but not in the strict sense of the word: the amplitude and the frequency of the pulses do vary. Also, the data is quite noisy. Think of an ECG curve or brain wave pattern to get the picture.

What I need is the following: I have to find only one local maximum and one local minimum in every pulse, that is, the highest and lowest points of every pulse. Because of the noisiness of the data, this is not that trivial: simply scanning through the data and marking every local peak/valley does not suffice, I have to filter them somehow.

I thought that I might exploit the periodic property of the data: First I'd find all peaks, then sort them according to their heights, then weed out those that are too close to a legitimate peak.

Here is what I have so far:

#!/usr/bin/perl use strict; use warnings; my (@x, @y); while (<>) { unless (/[^\d\.\s-]/) { my @f = split; push @x, $f[0]; foreach my $i (1..$#f) { push @{ $y[$i-1] }, $f[$i]; } } } my ($maxima, $minima) = peak_find($y[0]); foreach my $index (@$maxima) { print "$x[$index]\t$y[0][$index]\n"; } print "\n"; print "\n"; foreach my $index (@$minima) { print "$x[$index]\t$y[0][$index]\n"; } print "\n"; # Function that first finds all local maxima and minima in the data # (the algorithm comes from http://www.perlmonks.org/index.pl?node_id= +448913) # then weeds them out so that only one maximum and minimum remains for + each pulse. sub peak_find { #my $xa = shift; my $ya = shift; my $period = autocorr_firstpeak($ya); my (@maxima, @minima); push @maxima, undef; push @minima, undef; foreach my $i (1..$#$ya) { if ( $ya->[$i] > $ya->[$i-1] ) { $maxima[ -1 ] = $i; } elsif ( $maxima[ -1 ] ) { $maxima[ @maxima ] = undef; } if ( $ya->[$i] < $ya->[$i-1] ) { $minima[ -1 ] = $i; } elsif ( $minima[ -1 ] ) { $minima[ @minima ] = undef; } } pop @maxima unless $maxima[ -1 ]; pop @minima unless $minima[ -1 ]; # At this point, @maxima and @minima contain # all local maxima and minima of the data, respectively my %max_hash = map {$_ => $ya->[$_]} @maxima; my %min_hash = map {$_ => $ya->[$_]} @minima; my @sorted_max = sort { $max_hash{$b} <=> $max_hash{$a} } keys %ma +x_hash; my @sorted_min = sort { $min_hash{$a} <=> $min_hash{$b} } keys %mi +n_hash; my %filtered_max_hash; my %filtered_min_hash; my $factor = 0.7; foreach my $index (@sorted_max) { if (exists $max_hash{$index}) { $filtered_max_hash{$index} = $max_hash{$index}; foreach my $shift (0..int($period*$factor)) { delete $max_hash{$index + $shift} unless exists $filtered_max_hash{$index + $shift}; delete $max_hash{$index - $shift} unless exists $filtered_max_hash{$index - $shift}; } } } foreach my $index (@sorted_min) { if (exists $min_hash{$index}) { $filtered_min_hash{$index} = $min_hash{$index}; foreach my $shift (0..int($period*$factor)) { delete $min_hash{$index + $shift} unless exists $filtered_min_hash{$index + $shift}; delete $min_hash{$index - $shift} unless exists $filtered_min_hash{$index - $shift}; } } } my @filtered_maxima = sort { $a <=> $b } keys %max_hash; my @filtered_minima = sort { $a <=> $b } keys %min_hash; return \@filtered_maxima, \@filtered_minima; } # Returns the first peak of the autocorrelation function - # this corresponds to the highest frequency of periodicity in the data +. sub autocorr_firstpeak { my $xa = shift; my @a; my ($min, $max, $index); my $which = 1; # 1=min, 0=max; $a[0] = 0; for my $x (@$xa) { $a[0] += $x**2; } for my $i (1..$#$xa) { $a[$i] = 0; for my $j (0..$#$xa) { $a[$i] += $xa->[$j]*$xa->[$j-$i]; } if ($which) { if ($a[$i] < $a[$i-1]) { $min = $a[$i]; } else { $which = 0; } } else { if ($a[$i] > $a[$i-1]) { $max = $a[$i]; } else { $index = $i-1; last; } } } return $index; }


This seems to work.

However, I feel that it is quite inefficient - is there a way (are there ways) to make it better?

You can generate test data that sort of looks like the real thing with the following command:
perl -le'print $_,"\t",sin($_/10)+sin(($_/30))**2*0.2+sin($_/500)*0.4+ +4+(rand()+rand()+rand()+rand())/20 for 0..6000' > testdata.txt

In reply to Finding local maxima/minima in noisy, pediodic data by kikuchiyo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.