in reply to Process and combine two CSV files into one

Depending on what easier means and what you actually want, the following trivial assignment will feed the entries from %a into %c and from %b into %c, replacing any duplicates from %a with the entry having the same key in %b. This seemed to fit your example data.
%c = ( %a, %b);
Update: or from the shell (unsophisticated sorting though, else it lengthens a bit):
cat file1 file2 | perl -e 'while(<>){ /^((\d+\.){3}\d+)(.*)$/ and $c{ +$1 } = $1 . $2; } print values %c; ' | sort > file3;

One world, one people

Replies are listed 'Best First'.
Re^2: Process and combine two CSV files into one
by DrAxeman (Scribe) on Aug 09, 2005 at 17:13 UTC
    That seems easy enough! I'll keep this one in mind.
Re^2: Process and combine two CSV files into one
by DrAxeman (Scribe) on Aug 09, 2005 at 20:34 UTC
    I've got the two files combined in an array. If I use Dumper I can see the contents of the array and they are correct. My problem now is how to print them out in a CSV format. This is what I tried:

    my @row = $dbh->selectall_arrayref("SELECT IP, Domain, ServerName, Day +sUptime, OS, RAM, OSSP, InstallDate, CPUSpeed, CPUCount, CPUType FROM + hosts INNER JOIN info ON hosts.IP = info.IP"); #print Dumper @row; foreach my $data ( @row ) { print("$data"); };
    How can I get each row of the array to print out on a seperate line each?

      hmmm if you are just wondering on how to print to a csv file you can just use the IO::File and Text::CSV_XS. If you go check out the documentation of Text::CSV_XS and IO::File what happens is that you scan in your csv file 1 line at a time and each line is put into an array. Anyways, you can do something like

      use Text::CSV_XS; use IO::File; use strict; my $csv = Text::CSV_XS->new({sep_char => "\t"}); my %hashed; open my $in, '<', $input1 or die "Read Failed: $!\n"; while(<$in>) { $csv->parse( $_ ) or warn "Bad data: $_\n", next; my @row = $csv->fields(); $hashed{ $row[0] } = \@row; } close $in; my $fh = new IO::File "> $output"; foreach my $hash ( %hashed) { if($hashed{$hash}) { $csv->combine(@{$hashed{$hash}}); print "$hash has data $hashed{ $hash }->[0]\n"; $fh->print($csv->string, "\n"); } } $fh->close;

      This is just a basic read in of a csv file that is tab separated. All of the input and csv manipulation can be found in Text::CSV_XS documentation. But to output the file in csv format the easiest way I found was with IO::File.

        Great suggestion, but I don't think it's what I'm looking for. Then again, maybe I don't understand what you've given me. I'm using SQL to combine my CSV files based upon whats in them. The array that's created is what I need to print out. how can I use your example to print out the table in my array?
Re^2: Process and combine two CSV files into one
by DrAxeman (Scribe) on Aug 09, 2005 at 21:51 UTC
    I'm just not getting this join right. If I do a
    my @row = $dbh->selectall_arrayref("SELECT IP, ServerName, Domain, Day +sUptime, OS, RAM, OSSP, InstallDate, CPUSpeed, CPUCount, CPUType FROM + hosts LEFT JOIN info ON hosts.IP = info.IP");
    My 2 tables get joined. The only issue is that I get a warning:
    Execution ERROR: Ambiguous column name 'IP' called from /usr/lib/perl5 +/vendor_perl/5.8.6/i586-linux-thread-multi/DBI.pm at 1557.

    But if I try and change a column name, every field returned contains the "IP" value, regardless of what the true value should be.

    Also, when I try to join a 3rd table it takes a long time. The 2 tables are done in less than 15 seconds. When I add the 3rd table it's just over 5 minutes.
    my @row = $dbh->selectall_arrayref("SELECT IP, ServerName, Domain, Day +sUptime, OS, RAM, OSSP, InstallDate, CPUSpeed, CPUCount, CPUType, Par +titionFree FROM hosts, disks LEFT JOIN info ON hosts.IP = info.IP AND + hosts.IP = disks.IP");
    In the end, I want ALL data from hosts, and the relevant data from the other tables.
    Once I have the these 3 tables combined, I'd like to print them out in a CSV type format. How do I do that?
      The ambiguity error is because the IP just after the SELECT needs to be qualified by its table name (or alternatively could be by an alias name had you used aliases) - just put the table name follwed by a dot in front of that IP.

      Changing the column name (all other things being equal) simply forced it to interpret the same as a literal - you want to undo that change.

      It is normal for three tables to take much longer than two to join. The performance strategy for joining tables 1..n where n>2 is as follows:-

      - Join the first two tables placing the required columns in a temporary table.

      - Then join the third table with the temporary table, putting the results in a second temporary table and drop the first temporary table.

      - Continue this iterative process of joining a results temporary table with the next real table until joining the last real table with the last temporary table at which point the final results can be obtained directly instead of storing in a temporary table - this way no more than two tables are physically joined at once, whereas any number of tables have been logically joined.

      - If this query is intended to be re-used it should be placed in a stored procedure, not inside perl code, to prevent unnecessary communications overheads during execution, especially now that it has been split up. For this reason, ideally in terms of performance as well as other considerations, any re-used process of more than one SQL statement should be placed inside a stored procedure.

      Hope this helps!

      -S

      One world, one people