in reply to Eliminating Duplicate Lines From A CSV File

Taking ideas from Grandfather's code, and some ideas of my own, I have made some changes to your code that should work. I commented some lines you had and replaced them with new code. Mainly the biggest change was postponing the check for a missing average (and assignment of 0) until the printout part of the code (at the end) :-)
Chris
#!/usr/bin/perl use strict; use warnings; my %usage; my $files = 0; for my $file (qw/sfull1ns.dat sfull2ns.dat sfull3ns.dat/) { open (my $fh,'<',$file) or die "Can’t open file $file: $!"; while (my $line = <$fh>) { next if $line =~ /^Server/; #chomp($line); #my ($server, @data) = (split(",",$line)); # Assigns the first 2 values in the CSV # and discards the rest my ($server, $avg) = (split(",",$line)); #if ($data[0] lt "!" ) { # $data[0] = 0; #} #next if grep /[^0-9.]/, @data; #$usage{$server} = [] unless exists $usage{$server}; #push @{$usage{$server}}, 0 while @{$usage{$server}} < $files; #push @{$usage{$server}}, $data[0]; # if this is a second occurance of a server in the file, # its avg won't be assigned because the first one is already # stored there $usage{$server}[$files] ||= $avg; } continue { $files++ if eof; } close $fh or die "Can’t close file $file: $!"; } for my $server (sort keys %usage) { #Either this server has an average (for each element of the array) # or assign 0 to the ones that don't have a value my @avgs = map $usage{$server}[$_] || 0, 0..$#{$usage{$server}}; print join(",", $server, @avgs), "\n"; }

Replies are listed 'Best First'.
Re^2: Eliminating Duplicate Lines From A CSV File
by country1 (Acolyte) on Jul 25, 2007 at 18:26 UTC

    Cristoforo,


    I tested your perl code. It handles the elimination
    of rows with duplicate server names in the same file.


    It handles the replacement of missing values with '0',
    as long as non-missing values for a server are found
    in one of the later files.


    In other words, if the Server does not have a row in
    the 1st and 2nd file, but does have a row in the 3rd
    file, the code will produce


    Server,0,0,12.45


    But if the Server has a row in the 1st input file, but
    not in the 2nd and 3rd, the code will produce


    Server,2.85,,,


    or


    If the server has a row in the 1st and 2nd input file,
    but not in the 3rd, the code will produce


    Server,2.85,3.56,,


    Do you have any ideas why the missing values are not
    being replaced with '0' when they occur at the end of
    hash?

      Yes, that was an error. Also, see now that Grandfather's solution was about the same as mine. He used a Hash of Hash, where I used a Hash of Array.
      If no recs were found in the last file(s), no array entry was created.

      Correct line is,
      my @avgs = map $usage{$server}[$_] || 0, 0..$files-1;
      instead of
      my @avgs = map $usage{$server}[$_] || 0, 0..$#{$usage{$server}}; Update: changed 0..$i-1 to 0..$files-1


        Cristoforo,


        One final question. My input CSV files actually have
        two Header lines. The files look like this:


        File A
        Server Performance on Jun 9 2007,,,,
        Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
        wsomdavpra03,95.33,98.75,68.68,70.23
        WTEADAPOMS02,35.13,88.15,37.9,57.57


        I would have thought the line of code


        next if $line =~ /^Server/;


        would have eliminated the 1st two header lines.


        I also tried using two "<$fh>;" lines after the open
        statement. These both produced the following output
        in the first line.


        ,0,0,0,0,0,0


        Do you have any idea why this is happening?


        Cristoforo,


        The replaced line is causing a compilation error:


        Global symbol "$i" requires explicit package name
        at test1.pl line 54.


        I verified the line number of the changed line - it
        was line 54.