Merging Files Conundrum: A Better Explanation(?)

Limo has asked for the wisdom of the Perl Monks concerning the following question:

Let me try explaining my problem another way:

File A contains the following headers:

C1    C2    C3    C4

File B contains the following headers:

CA    CB    CC    CD


File A, $C1, contains "src/dst" strings

File A, $C2 contains IP Adresses relative to each "src/dst" string

File B, $CA  contains "src/dst" strings
[download]

NOTE: File A, Column C1 has a DIFFERENT label than File B, Column CA, although they contain the same data!

File B, $CB contains link speed
[download]

the command:

merge.pl [ File A C1,C2 : File B CA,CB ]
[download]

should return:

Link Name    IP    ifSpeed
[download]

where the above are header labels, with corresponding data listed in columns below. Also note, that the program must be smart enough to generate new header names. Thanks for your replies, thus far.

Comment on Merging Files Conundrum: A Better Explanation(?) Select or Download Code

Replies are listed 'Best First'.
Re (tilly) 1: Merging Files Conundrum: A Better Explanation(?) by tilly (Archbishop) on Sep 15, 2000 at 12:32 UTC
This is a problem that simply screams for a relational database. (Certainly if your data sets grow performance will become a problem without one.) Please look at DBI. If you really want to leave your data in flat-files, you can always use DBI with DBD::CSV. (And switch to a real database later if need be.)	[reply]
RE: Re (tilly) 1: Merging Files Conundrum: A Better Explanation(?) by Limo (Scribe) on Sep 15, 2000 at 18:55 UTC
From reading up on DBI and DBI::CSV, this would be my soulution, EXCEPT, our brain-dead Solaris box has a broken compiler! And getting this box's owner to fix it is going to be near impossible! Thanks for the info.	[reply]
Re: Merging Files Conundrum: A Better Explanation(?) by little (Curate) on Sep 15, 2000 at 11:47 UTC
ok, hopefully nobody will shot at me for this huge piece of code, but the following is something we were using for such kind o stuff. #!/usr/bin/perl -w use strict; $\|++; my (@DBFile, %DATA, $file, $name); $DBFile[0] = "../path/file_A.txt"; $DBFile[1] = "../path/file_B.txt"; my $output = "../path/file_C.txt"; ReadAllData(); print "\n".$DATA{'A'}{'C1'}{0}; print "\n".$DATA{'A'}{'C1'}{1}; print "\n".$DATA{'A'}{'C1'}{2}; print "\n".$DATA{'B'}{'CA'}{0}; print "\n".$DATA{'B'}{'CA'}{1}; print "\n".$DATA{'B'}{'CA'}{2}; ###### ## SUB ReadAllData shall ## read all files specifies in the array @DBFile ## sub ReadAllData { my ( $i, $file); $i = 0; foreach $file (@DBFile) { &me_read_CSVDB($file,$i); $i++; } return $i; } ## ## END SUB ReadAllData ###### ###### ## SUB me_read_CSVDB shall ## READ specified FILE_DATA_BASE ## ## into a large hash ## sub me_read_CSVDB { my ($tmp, $name, $num, $i, $j, $line, %keyfields); $name = $_[0]; $num = $_[1]; my $FILE_DATA_BASE = $name; $name =~ s/(\.txt$)//i; $name =~ s/^\..*?\_//i; $i = 0; $line = ""; open CURRENT_DB, "<$FILE_DATA_BASE" or die "Couldn't open file $FI +LE_DATA_BASE: $!"; my @keyfields = split(/[;]/, <CURRENT_DB>); ## EXTRACT KEYFIELDS # +# while ($tmp = $keyfields[$i]){ $keyfields{"$name"}{$i} = $tmp; $i++; } my $numofkeys = @keyfields; ## EXTRACT VALUEFIELDS FOR EACH RECORD +SET ## $tmp = $keyfields{"$name"}{($numofkeys-1)}; chomp $tmp; $keyfields{"$name"}{($numofkeys-1)} = $tmp; $i = 0; while ($line = <CURRENT_DB>){ chop $line; my @valuefields = split(/[;]/, $line); if (length($line) != 0){ for ($j = 0; $j < $numofkeys; $j++) { $DATA{"$name"}{($keyfields{$name}{$j})}{$i} = $valuefi +elds[$j]; } } $i++; } close CURRENT_DB; ## END OF READING DATA_BASE ## return $i; } ## ## END OF SUB me_read_CSVDB ## ############################## [download] so what it does? it is reading the specified files and enables you to access your data by the name of the row in your file. the files look for now as follows. Play around with that and see for yourself. But anyway, all suggestions for improvement are very welcome :-) `"../path/file_A.txt" ID;C1;C2;C3;C4 1;one;clown;funny;children 2;two;hero;truly;saga 3;three;monk;honest;perl "../path/file_B.txt" ID;CA;CB;CC;CD 1;one;baby;scream;teddy 2;two;mom;work;book 3;three;daddy;football;PC` [download] have a nice try :-) update: just orthography	[reply] [d/l] [select]
Re: Merging Files Conundrum: A Better Explanation(?) by turnstep (Parson) on Sep 15, 2000 at 20:27 UTC
Here is my take. Since we have not been told how the program knows the header labels, I will assume that they are tab delimited on the first line of the file. :) To use it, just enter the first file name, the fields to be used, the second file, and the fields to be used. The first field in each list is the one that "matches" with the other. For example, `merge.pl AA 1,2,5,6 BB 3,4,1,5` [download] would print any records where the third field in the file "BB" matched the first file in the file "AA", and then would print out fields one, two, five, and six from A, followed by four, one, and five from B. Formatting is tab-delmited. A final assumption is that the input files are whitespace delimited, but this could easily be changed to tab-delimited. ## merge.pl use strict; my $AFile = shift; my @Acols = split(/,/ => shift); my $BFile = shift; my @Bcols = split(/,/ => shift); ## Read B into memory first open(B, "$BFile") or die "Could not open $BFile: $!\n"; ## Grab the header labels from the first line and store them for later +: my @HeaderB = split(/\t/, <B>); chomp @HeaderB; ## Now go through and save each line into a hash, where they key ## is the field to be matched, and the value is a reference to ## an array that holds all the fields my %B; while(<B>) { my @bar = split(/\s+/ => $_); ## Change to tab if needed $bar[1] or next; ## Skip blank lines: add other validation if needed $B{$bar[$Bcols[0]-1]}=\@bar; } close(B); shift @Bcols; ## Remove B's first header: we will use A's open(A, "$AFile") or die "Could not open $AFile: $!\n"; ## Print all the headers now: my @HeaderA = split(/\t/, <A>); chomp @HeaderA; for (@Acols) { print "$HeaderA[$_-1]\t"; } for (@Bcols) { print "$HeaderB[$_-1]\t"; } ## Remember that shift? print "\n"; ## Save the offset of the "matching field" into a variable ## Mainly makes things easier to read below my $A=$Acols[0]-1; while(<A>) { my @bar = split(/\s+/ => $_); ## Change to tab if needed $bar[1] or next; if ($B{$bar[$A]}) { ## We have a match from %B! ## Print all the A fields we want: for (@Acols) { print "$bar[$_-1]\t"; } ## Print all the B fields we want: for (@Bcols) { print "$B{$bar[$A]}[$_]\t"; } print "\n"; } } close(A); [download]	[reply] [d/l] [select]