in reply to Compare fields in a file

When dealing with comma separated data one of the modules Text::CSV and Text::xSV should be your first stop.

When dealing with uniqueness ('only the largest magnitude within each second'second') hashes should spring to mind. Combining those ideas and adding a little error checking (the various sample data you provided were inconsistent) the following should point you in the right direction:

use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new (); my @expectedFields = qw(X Y Z Time Amplitude); # Validate the header line $csv->parse (scalar <DATA>); my @fieldNames = $csv->fields (); die "Unexpected field list: @fieldNames\nExpected: @expectedFields\n" unless @expectedFields == @fieldNames; for my $fieldIndex (0 .. $#fieldNames) { next if $fieldNames[$fieldIndex] eq $expectedFields[$fieldIndex]; die "Got field name $fieldNames[$_]. Expected $expectedFields[$fie +ldIndex]\n"; } # Find maximums in each 1 second slot my %maximums; # Keyed by date/time while (defined (my $line = <DATA>)) { $csv->parse ($line); my ($x, $y, $z, $time, $amplitude) = $csv->fields (); $time =~ s/\.\d{3}//; # Strip fractional seconds next if exists $maximums{$time} && $maximums{$time}{amp} >= $ampli +tude; $maximums{$time}{amp} = $amplitude; $maximums{$time}{line} = $line; } # Output results ordered by time assuming the same date print $maximums{$_}{line} for sort keys %maximums; __DATA__ X,Y,Z,Time,Amplitude 2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:08.151,1 2550,531,66,10-12-2007 07:03:09.069,1 2549,529,62,10-12-2007 07:03:09.151,2

Prints:

2550,531,66,10-12-2007 07:03:08.069,2 2549,529,62,10-12-2007 07:03:09.151,2

Note: always use strictures (use strict; use warnings;).


Perl's payment curve coincides with its learning curve.

Replies are listed 'Best First'.
Re^2: Compare fields in a file
by honyok (Sexton) on Feb 10, 2009 at 22:08 UTC
    From the responses, I see that I have not explained correctly. Let me clarify: I need to sort by descending amplitude, save the largest, remove any entries within +/- 1 second, then repeat on the next largest left in the list, ...
    - honyok

      How about you take one of the plethora of solutions you have been provided that solve the problem for "I'd like to keep only the largest magnitude within each second.". Alter it to solve your actual problem, then show us the output you get and the output you want if you can't make it work?

      For future reference, providing a little sample data, your best attempt at coding the solution, your attempts' output, and the output you desire in your initial node actually saves everyone (especially you) a lot of time. An indication of why you want to perform a particular trick often helps us provide a better answer too.


      Perl's payment curve coincides with its learning curve.
      Easy, in my script, add
      sort { $b->[2] <=> $a->[2] }
      between the first map and the grep.

      The output will now be sorted by descending amplitude and you will have only one entry per second.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Thanks gentlefolk. Great ideas!
        Elegant. I think I see how your code tests for each second,
        $date =~ m/(\d{2})-(\d{2})-(\d{4}) (\d{2}:\d{2}:\d{2})/;
        but my goal is to filter based on any arbitrary time increment(+/-1s,+/-5s,+/-300ms,...).

        Re Grandfather's comments: The ultimate point of this exercise is to avoid MS Excel monkeying with my time stamps. I have files with thousands of data points, each with many numerical attributes(x,y,z,date,time,magnitude,...). I would like to adapt a solution to sort and/or filter based on any or all of the attributes - time being the most difficult. Anything done outside of Excel saves time and aggravation.
        - honyok