country1 has asked for the wisdom of the Perl Monks concerning the following question:


I have multiple CSV files (3 for this example) with
identical record layouts.


File A
Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,93.75,95.87,66.67,68.13
wsomdavpra03,90.39,94,65.77,68.51
wsomddvfxa01,39.22,92.19,82.59,88.25
wsompapgtw05,49.6,87.07,91.6,93.78


File B
Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,34.78,100,55.1,67.6
wsomdavpra03,69.04,98.55,84.07,89.73
wsomddvfxa01,92.44,97.54,67.72,71.69
wsompapgtw05,48.77,96.9,92.1,93.55


File C
Server,Avg CPU,P95 CPU,Avg Mem Util,P95 Mem Util
WSOMQAVPRA05,93.13,98.11,68.95,73.47
wsomdavpra03,68.85,97.56,76.35,98.23
wsomddvfxa01,46.97,96.29,88.23,94.02
wsompapgtw05,30.66,93.74,39.89,71.35


What I am trying to do is for each Server (in Column
1) I want to get the Avg CPU (in Column 2) from each
of the 3 files. In the files above this would produce


File OUT
WSOMQAVPRA05,93.75,34.78,93.13
wsomdavpra03,90.39,69.04,68.85
wsomddvfxa01,39.22,92.44,46.97
wsompapgtw05,49.6,48.77,30.66


Let's assume for the moment that these are the way the
3 files will exist (1 to 1 to 1 match on Server).
From my experience with perl, it is not an easy
process to do a match-merge via the field (Server).


Can anyone give me some ideas, thoughts or pearls of
wisdom on how to go about accomplishing this task?

  • Comment on Combining Records From Multiple Files based on Common Key Values

Replies are listed 'Best First'.
Re: Combining Records From Multiple Files based on Common Key Values
by tirwhan (Abbot) on Jul 13, 2007 at 17:42 UTC

    The subject line of your question contains the words "key values", that could have given you a hint that you can solve this problem using a hash :-).

    Normally I recommend using Text::CSV_XS on CSV files, but this example data looks simple enough to handle without an extra module (reconsider using the module if your real datafiles should contain things like quoted data or escaped commas):

    #!/usr/bin/perl use strict; use warnings; my %result; for my $file ("File_A","File_B","File_C") { open (my $fh,"<",$file) or die "Can't open file $file: $!"; <$fh>; # skip header line while (my $line = <$fh>) { my ($server,$cpua) = (split(",",$line))[0,1]; push @{$result{$server}},$cpua; } close $fh or die "Can't close file $file: $!"; } open (my $nfh,">","Result_File") or die "Can't open result file: $!"; for my $server (sort keys %result) { print $nfh $server.",".join(",",@{$result{$server}})."\n"; } close $nfh or die "Can't close result file: $!";

    This assumes that you want the result file to contain the servers listed in alphabetical order. If you want a different ordering, simply construct an array holding the server names in the order you want them and use that for the final loop. For example

    my @order = qw(wsompapgtw05 wsomddvfxa01 wsomdavpra03 WSOMQAVPRA05); for my $server (@order) { ...

    All dogma is stupid.
      tirwhan, thanks for your help. Your solution assumes that the 3 input files have the server names in the same order in each of the 3 files. This is not always the case. How do you get the for and while loops to match the keys (i.e., server name) regardless of where they are in the 3 input files?
        Your solution assumes that the 3 input files have the server names in the same order in each of the 3 files

        Umm, no it doesn't. The cpu value get associated with the server name, regardless on which line they appear. Output order is also regardless of the order the servers are in in the input files. Try it out and you'll see (this is a hash, not an array!).


        All dogma is stupid.
Re: Combining Records From Multiple Files based on Common Key Values
by dsheroh (Monsignor) on Jul 13, 2007 at 20:27 UTC
    Assuming you're using a *nix system (which seems like a safe assumption given that you're talking about load averages), this can be done at a shell prompt without needing Perl, just a little plumbing:
    $ join -t, file1.txt file2.txt | join -t, - file3.txt | cut -f 1,2,6,1 +0 -d, WSOMQAVPRA05,93.75,34.78,93.13 wsomdavpra03,90.39,69.04,68.85 wsomddvfxa01,39.22,92.44,46.97 wsompapgtw05,49.6,48.77,30.66
    Each join combines two files based on their first field; the -t, tells it to use commas as the field separator (whitespace is the default). This is then sent to cut, which extracts the 1st, 2nd, 6th, and 10th field from each line, again using comma for the field separator (as specified by -d,).

    Note that join is dependent on the sort field being in the same order in both input files, so you'll need to sort into temp files if this isn't known to be the case. If you are making temp files, though, you can do a sort <filename> | cut -f1,2 -d, > <tempfile> and eliminate the need for the final cut after joining the files, which makes it a bit easier to generalize to different numbers of files.

Re: Combining Records From Multiple Files based on Common Key Values
by Limbic~Region (Chancellor) on Jul 15, 2007 at 01:35 UTC