meredib has asked for the wisdom of the Perl Monks concerning the following question:

Ok Monks, I'm finished beating my head against the desk and I need some guidance. I have one main file(main.csv), and I have a few other files(20090101.csv, 20090102.csv, and 20090103.csv). The main file looks something like this:

Number:
10 4 8
The other files look something like this:
4, 523 10, 999 8, 0

So I guess step one I want to match the main.csv to the first file say 20090101.csv, and where the first number matches I want to add the second number to the main file. That part is easy, however when I compare the updated main file with the next file that's where my inexperience shows.

So long story short I want the main file to look something like:
Number,20090101,20090102 ,20090103 10 ,999 ,"file 2's numbers","file 3's numbers" 4 ,523 ,"file 2's numbers","file 3's numbers" 8 ,0 ,"file 2's numbers","file 3's numbers"
All I'm really looking for is guidance but if you have something written already I will gladly take a look.

Replies are listed 'Best First'.
Re: How to go about this?
by kyle (Abbot) on Feb 04, 2009 at 17:44 UTC

    Here's some untested skeleton code.

    my @out; open my $main_fh, '<', 'main.csv' or die "Can't read main.csv: $!"; while (<$main_fh>) { chomp; push @out, [ $_ ]; } foreach my $file ( glob '200*.csv' ) { # you'll want to store this filename for your header open my $fh, '<', $file or die "Can't read '$file': $!"; /^(\d+),(\d+)$/ && push @{ $out[$1] }, $2 for <$fh>; } open my $out_fh, '>', 'out.csv' or die "Can't write 'out.csv': $!"; print {$out_fh} $header_line_here; print {$out_fh} join( q{,}, @{$_} ), "\n" for grep defined, @out;

    Some notes:

    • If your main.csv numbers are very large, you should use a hash instead of an array.
    • For more general CSV-related code, use Text::CSV_XS instead.
    • My use of glob might assume too much about your filenames.
    • I haven't bothered to close any of my open filehandles. Don't let that stop you from doing it.
    • This code assumes that every number is present in every file. Things will get very messed up for any main.csv number that appears in some files and not others.
    • If you find the use of an array reference confusing, see perlreftut, perlref, and References quick reference.

    I haven't tested it, but hopefully it gives you some ideas.

Re: How to go about this?
by kennethk (Abbot) on Feb 04, 2009 at 17:43 UTC
    I note from your previous node Comparing 2 csv files that other monks have pointed Text::CSV out to you, so hopefully you are using it to parse and generate the files. Assuming you have a fixed number of source ("other") files, you could create a hash for each one (keyed on your index, i.e. 10, 4, 8...) and then output the key followed by the three hash values. If the number of source files is arbitrary, you'll likely want an array of hashes - perllol has details on working with those.
Re: How to go about this?
by jethro (Monsignor) on Feb 04, 2009 at 17:48 UTC

    Please see Markup in the Monastery. Your posting is nearly unintelligible without some structure. At them mininum edit your posting to use code-tags for your data examples.

    Generally if your main data fits into memory, use a hash to store the main file contents with the number as key and the rest as string (or an array if you feel up to a HashOfArrays structure) and then update this data with the contents of the newer csv-files. That is if my guess about what you want to do is right

    Further question: if 10 is one of the numbers, is it in every csv file or could there be files without the 10?

Re: How to go about this?
by apl (Monsignor) on Feb 04, 2009 at 18:23 UTC
    I have a few other files(20090101.csv, 20090102.csv, and 20090102.csv).
    What's the difference between 20090102.csv and 20090102.csv ? [That is, why is the same file specified twice, as confirmed by your header Number,20090101,20090102 ,20090102?

    Showing us the code you wrote is more likely to help you (by possibly fixing the problem) than by simply giving you the answer.

      Ok, I've come a lot further since I last followed up on the replies I've gotten. Below is the script that I have. Here's the low down. I have a master file and it has some lines in that I'm trying to match to another file; the other file has a field in it that I would like to place in the master file. Both files have a key that I'm using to compare the rows and that's actually an IP Address.
      My Problem is that there are some records that are in the master file that are not in the other file and when I check my output those records don't show. So that's one thing I'm looking for and the other works the other way around in that there are some records in the other file that are not in the master file. Pretty much I want every line from both files in the output file.
      If there is anyone that can look at the code below and modify it to where it will do just that?
      #!perl #perl "copy of findandreplaceemail2.pl" -d . -i "Reporting Devices_01- +03-2009-09-00-00.csv" -s "Master Sched.csv" -o "Master Sched(new1).cs +v" use Getopt::Long; use Text::CSV; &GetOptions('d=s' => \$baseDir ,'i=s' => \$inFlNm ,'s=s' => \$subFlNm ,'o=s' => \$outFlNm ,'h' => \$hlp ,'hlp' => \$hlp ,'help' => \$hlp); $emailList = "test.csv"; $csv = Text::CSV->new(); # Save substitution data in memory indexed by manager email address open IN1, "<"."./$baseDir/$subFlNm" or die "Can't open input file >$su +bFlNm<\n"; open OUT, ">"."$baseDir/$outFlNm" or die "Can't open output file >$out +FlNm<\n"; while(<IN1>){ chomp; $_ =~ s/\,\,$//; $csv->parse($_); @subIn = $csv->fields(); $subRcd{$subIn[2]} = $_; #print "@subIn\n"; } close IN1; #There's got to be a way to take all of the Master file and use it in +the while loop open IN1, "<"."./$baseDir/$subFlNm" or die "Can't open input file >$su +bFlNm<\n"; while ($in1 = <IN1>){ chomp($in1); @mainFlds = split(/,/, $in1); $mainFlds[2] =~ s/^\s+//; $mainFlds[2] =~ s/\s+$//; if($in1 =~ /\d+\.\d+\.\d+\.\d+/){ open IN2, "<"."$baseDir/$inFlNm" or die "Can't open input file + >$inFlNm<\n"; while ($in2 = <IN2>){ chomp($in2); @subFlds = split(/,/, $in2); if($in2 =~ /\d+\.\d+\.\d+\.\d+/){ $subFlds[1] =~ s/^\s+//; $subFlds[1] =~ s/\s+$//; if("$mainFlds[2]" eq "$subFlds[1]" && "$mainFlds[1]" e +q "$subFlds[0]"){ if($in1 =~ /,,$/ || $mainFlds[7] ne '' || $mainFld +s[8] ne ''){ print OUT "$in1,$subFlds[5]\n"; } elsif ($mainFlds[6] ne ''){ print OUT "$in1,,$subFlds[5]\n"; } else { print OUT "$in1,,,$subFlds[5]\n"; } } } } close IN2; } elsif(($mainFlds[2] eq '' || $mainFlds[3] eq '') && $mainFlds[5] n +e '') { open IN2, "<"."$baseDir/$inFlNm" or die "Can't open input file + >$inFlNm<\n"; while ($in2 = <IN2>){ chomp($in2); @subFlds = split(/,/, $in2); if($subFlds[1] eq ''){ #print ">@subFlds\n"; $subFlds[1] =~ s/^\s+//; $subFlds[1] =~ s/\s+$//; if("$mainFlds[3]" eq "$subFlds[2]" && "$mainFlds[4]" e +q "$subFlds[3]" && "$subFlds[4]" ne ''){ if($in1 =~ /,,$/ || $mainFlds[7] ne '' || $mainFld +s[8] ne ''){ print OUT "$in1,$subFlds[5]\n"; } elsif ($mainFlds[6] ne ''){ print OUT "$in1,,$subFlds[5]\n"; } else { print OUT "$in1,,,$subFlds[5]\n"; } } } } close IN2; } else{ if($in1 =~ /Date/){ @date = split(/_/, $inFlNm); $date[1] =~ s/\-\d\d\-\d\d\-\d\d\.csv//; print OUT "$in1,$date[1]\n"; } else{ print OUT "$in1\n"; } } next; }
        I am big fan of SQLite. If I were you I'd load each of your CVS files into SQLite relation database and deal with the data via SQL.
        You seem to be loading your first csv file into a hash called %subRcd, but then you are never using that hash anywhere else in the code (it's "write-only").

        Maybe you should read your second file into that same hash (assuming that the contents of both files are structured the same way, or at least that it's easy to tell them apart in terms of their contents).

        When the second file has a key field that matches something in the first file, you'll need to update the hash value for that key, in order to combine the info from the two files into the one record. Apart from that, keys that occur only in the "master" will remain unmodified in the hash, and keys that occur only in the second file will be loaded and kept as-is in the hash.

        Then just print out all the hash elements to get the complete union of contents from the two files.

        (But I really like Plankton's idea of using SQLite -- definitely worthwhile. Also, you really should start with use strict; and use warnings; -- it will make you a better programmer, and you'll grow to appreciate it.)