Re: Eliminating Duplicate Lines From A CSV File

Taking ideas from Grandfather's code, and some ideas of my own, I have made some changes to your code that should work. I commented some lines you had and replaced them with new code. Mainly the biggest change was postponing the check for a missing average (and assignment of 0) until the printout part of the code (at the end) :-)
Chris

#!/usr/bin/perl
use strict;
use warnings;

my %usage;
my $files = 0;

for my $file (qw/sfull1ns.dat sfull2ns.dat sfull3ns.dat/)  {

    open (my $fh,'<',$file) or die "Can’t open file $file: $!";
    
    while (my $line = <$fh>) {
        
        next if $line =~ /^Server/;
    
        #chomp($line);
        #my ($server, @data) = (split(",",$line));
        
        # Assigns the first 2 values in the CSV
        # and discards the rest
        my ($server, $avg) = (split(",",$line));
        
        
        #if ($data[0] lt "!" )  {
        #  $data[0] = 0;
        #}
        
        #next if grep /[^0-9.]/, @data;
        
        #$usage{$server} = [] unless exists $usage{$server};
        #push @{$usage{$server}}, 0 while @{$usage{$server}} < $files;
        #push @{$usage{$server}}, $data[0];
        
        # if this is a second occurance of a server in the file,
        # its avg won't be assigned because the first one is already
        # stored there
        $usage{$server}[$files] ||= $avg;
    }
    continue {
      $files++ if eof;
    }
    
    close $fh or die "Can’t close file $file: $!";
}

for my $server (sort keys %usage) {
    
    #Either this server has an average (for each element of the array)
    # or assign 0 to the ones that don't have a value
    my @avgs = map $usage{$server}[$_] || 0, 0..$#{$usage{$server}};
    print join(",", $server, @avgs), "\n";
}
[download]

Comment on Re: Eliminating Duplicate Lines From A CSV File Download Code

Replies are listed 'Best First'.
Re^2: Eliminating Duplicate Lines From A CSV File by country1 (Acolyte) on Jul 25, 2007 at 18:26 UTC
Cristoforo, I tested your perl code. It handles the elimination of rows with duplicate server names in the same file. It handles the replacement of missing values with '0', as long as non-missing values for a server are found in one of the later files. In other words, if the Server does not have a row in the 1st and 2nd file, but does have a row in the 3rd file, the code will produce Server,0,0,12.45 But if the Server has a row in the 1st input file, but not in the 2nd and 3rd, the code will produce Server,2.85,,, or If the server has a row in the 1st and 2nd input file, but not in the 3rd, the code will produce Server,2.85,3.56,, Do you have any ideas why the missing values are not being replaced with '0' when they occur at the end of hash?	[reply]
Re^3: Eliminating Duplicate Lines From A CSV File by Cristoforo (Curate) on Jul 25, 2007 at 19:03 UTC
Yes, that was an error. Also, see now that Grandfather's solution was about the same as mine. He used a Hash of Hash, where I used a Hash of Array. If no recs were found in the last file(s), no array entry was created. Correct line is, `my @avgs = map $usage{$server}[$_] \|\| 0, 0..$files-1;` instead of `my @avgs = map $usage{$server}[$_] \|\| 0, 0..$#{$usage{$server}};` Update: changed 0..$i-1 to 0..$files-1	[reply] [d/l] [select]
Re^4: Eliminating Duplicate Lines From A CSV File by country1 (Acolyte) on Jul 26, 2007 at 17:50 UTC
Cristoforo, One final question. My input CSV files actually have two Header lines. The files look like this: File A Server Performance on Jun 9 2007,,,, Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util wsomdavpra03,95.33,98.75,68.68,70.23 WTEADAPOMS02,35.13,88.15,37.9,57.57 I would have thought the line of code next if $line =~ /^Server/; would have eliminated the 1st two header lines. I also tried using two "<$fh>;" lines after the open statement. These both produced the following output in the first line. ,0,0,0,0,0,0 Do you have any idea why this is happening?	[reply]
Re^4: Eliminating Duplicate Lines From A CSV File by country1 (Acolyte) on Jul 25, 2007 at 19:22 UTC
Cristoforo, The replaced line is causing a compilation error: Global symbol "$i" requires explicit package name at test1.pl line 54. I verified the line number of the changed line - it was line 54.	[reply]