in reply to Eliminating Duplicate Lines From A CSV File

use strict; use warnings; my %usage; my @sfull1ns = split /\n/, <<'FILE'; Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util WSOMQAVPRA05,93.75,95.87,66.67,68.13 wsomdavpra03,90.39,94,65.77,68.51 wsomddvfxa01,39.22,92.19,82.59,88.25 wsomddvfxa01,35.45,89.23,79.89,83.24 FILE my @sfull2ns = split /\n/, <<FILE; Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util WSOMQAVPRA05,34.78,100,55.1,67.6 wsomdavpra03,69.04,98.55,84.07,89.73 wsomddvfxa01,92.44,97.54,67.72,71.69 wsompapgtw05,48.77,96.9,92.1,93.55 FILE my @sfull3ns = split /\n/, <<FILE; Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util WSOMQAVPRA05,93.13,98.11,68.95,73.47 wsomdavpra03,68.85,97.56,76.35,98.23 wsomddvfxa01,46.97,96.29,88.23,94.02 wsompapgtw05,30.66,93.74,39.89,71.35 FILE for my $fileData (['File A', \@sfull1ns], ['File B', \@sfull2ns], ['Fi +le C', \@sfull3ns],) { my ($filename, $data) = @$fileData; shift @$data; for my $line (@$data) { chomp($line); my ( $server, @data ) = ( split( ",", $line ) ); $usage{$server}{$filename}{value} ||= $data[0]; } } for my $file ('File A', 'File B', 'File C') { $usage{$_}{$file}{value} ||= 0 for keys %usage; } for my $server (sort keys %usage) { print "$server,", join (',', map {$usage{$server}{$_}{value}} sort keys %{$usage +{$server}}), "\n"; }

Prints:

WSOMQAVPRA05,93.75,34.78,93.13 wsomdavpra03,90.39,69.04,68.85 wsomddvfxa01,39.22,92.44,46.97 wsompapgtw05,0,48.77,30.66

Reversion to file based code should be fairly obvious. Lines in arrays makes the code stand alone.

Update: Remove averaging code for multiple file entries for a server and use first entry value instead.


DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: Eliminating Duplicate Lines From A CSV File
by b4swine (Pilgrim) on Jul 24, 2007 at 22:57 UTC
    You actually computed the average in case there were multiple averages in a file, like I was tempted to do. But if you read carefully, the original poster wanted the first number in case there were multiple entries for the same server in one file.
Re^2: Eliminating Duplicate Lines From A CSV File
by country1 (Acolyte) on Jul 25, 2007 at 17:10 UTC

    GrandFather,


    The CSV files I am using for input are actually much
    larger (1600 + records each). How can I modify your
    code to read the 3 input files ("sfull1ns.dat",
    "sfull2ns.dat", and "sfull3ns.dat") from disk rather
    than instream?

      Is this what you want?

      use strict; use warnings;
      my %usage; my @files = qw(sfull1ns.dat sfull2ns.dat sfull3ns.dat); for my $file ( @files ) { open( my $fh, "<", $file ) or die "Can't open file $file: $!"; <$fh>; # Skip header line while ( my $line = <$fh> ) { chomp($line); my ( $server, @data ) = ( split( ",", $line ) ); $usage{$server}{$file}{value} ||= $data[0]; } } for my $file (@files) { $usage{$_}{$file}{value} ||= 0 for keys %usage; } for my $server (sort keys %usage) { print "$server,", join (',', map {$usage{$server}{$_}{value}} sort keys %{$usage +{$server}}), "\n"; }

      DWIM is Perl's answer to Gödel