Re: Eliminating Duplicate Lines From A CSV File

use strict;
use warnings;

my %usage;
my @sfull1ns = split /\n/, <<'FILE';
Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,93.75,95.87,66.67,68.13
wsomdavpra03,90.39,94,65.77,68.51
wsomddvfxa01,39.22,92.19,82.59,88.25
wsomddvfxa01,35.45,89.23,79.89,83.24
FILE
my @sfull2ns = split /\n/, <<FILE;
Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,34.78,100,55.1,67.6
wsomdavpra03,69.04,98.55,84.07,89.73
wsomddvfxa01,92.44,97.54,67.72,71.69
wsompapgtw05,48.77,96.9,92.1,93.55
FILE
my @sfull3ns = split /\n/, <<FILE;
Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,93.13,98.11,68.95,73.47
wsomdavpra03,68.85,97.56,76.35,98.23
wsomddvfxa01,46.97,96.29,88.23,94.02
wsompapgtw05,30.66,93.74,39.89,71.35
FILE

for my $fileData (['File A', \@sfull1ns], ['File B', \@sfull2ns], ['Fi
+le C', \@sfull3ns],) {
    my ($filename, $data) = @$fileData;
    
    shift @$data;
    
    for my $line (@$data) {
        chomp($line);

        my ( $server, @data ) = ( split( ",", $line ) );

        $usage{$server}{$filename}{value} ||= $data[0];
    }
}

for my $file ('File A', 'File B', 'File C') {
    $usage{$_}{$file}{value} ||= 0 for keys %usage;
}

for my $server (sort keys %usage) {
    print "$server,",
        join (',', map {$usage{$server}{$_}{value}} sort keys %{$usage
+{$server}}),
        "\n";
}
[download]

Prints:

WSOMQAVPRA05,93.75,34.78,93.13
wsomdavpra03,90.39,69.04,68.85
wsomddvfxa01,39.22,92.44,46.97
wsompapgtw05,0,48.77,30.66
[download]

Reversion to file based code should be fairly obvious. Lines in arrays makes the code stand alone.

Update: Remove averaging code for multiple file entries for a server and use first entry value instead.

DWIM is Perl's answer to Gödel

Comment on Re: Eliminating Duplicate Lines From A CSV File Select or Download Code

Replies are listed 'Best First'.
Re^2: Eliminating Duplicate Lines From A CSV File by b4swine (Pilgrim) on Jul 24, 2007 at 22:57 UTC
You actually computed the average in case there were multiple averages in a file, like I was tempted to do. But if you read carefully, the original poster wanted the first number in case there were multiple entries for the same server in one file.	[reply]
Re^2: Eliminating Duplicate Lines From A CSV File by country1 (Acolyte) on Jul 25, 2007 at 17:10 UTC
GrandFather, The CSV files I am using for input are actually much larger (1600 + records each). How can I modify your code to read the 3 input files ("sfull1ns.dat", "sfull2ns.dat", and "sfull3ns.dat") from disk rather than instream?	[reply]
Re^3: Eliminating Duplicate Lines From A CSV File by GrandFather (Saint) on Jul 25, 2007 at 21:03 UTC
Is this what you want? `use strict; use warnings;` [download] Read more... Create the sample files (1090 Bytes) my %usage; my @files = qw(sfull1ns.dat sfull2ns.dat sfull3ns.dat); for my $file ( @files ) { open( my $fh, "<", $file ) or die "Can't open file $file: $!"; <$fh>; # Skip header line while ( my $line = <$fh> ) { chomp($line); my ( $server, @data ) = ( split( ",", $line ) ); $usage{$server}{$file}{value} \|\|= $data[0]; } } for my $file (@files) { $usage{$_}{$file}{value} \|\|= 0 for keys %usage; } for my $server (sort keys %usage) { print "$server,", join (',', map {$usage{$server}{$_}{value}} sort keys %{$usage +{$server}}), "\n"; } [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]