Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

How to combine multiple files together

by xspikx (Acolyte)
on Oct 17, 2005 at 15:46 UTC ( [id://500754]=perlquestion: print w/replies, xml ) Need Help??

xspikx has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have about 10 cvs files that I need to combine into one file. The cvs files have comma as the delimiter. Now the way I have to combine them is to use the first file's first field as the search criteria. So I put the first line of the first file into the big cvs file and then any lines in the other files that contain the first column of the first line. Then the second line and so on. Does anyone have any ides of how this could be done?

Replies are listed 'Best First'.
Re: How to combine multiple files together
by marto (Cardinal) on Oct 17, 2005 at 15:52 UTC
    Hi,

    Take a look at the File::Sort module.
    "File::Sort - Sort a file or merge sort multiple files"
    It looks like it does what you need. Take a look at the documentation for examples.

    Hope this helps.

    Martin
      Thanks, It looks like it's exactly what I'm looking for.
Re: How to combine multiple files together
by kirbyk (Friar) on Oct 17, 2005 at 15:57 UTC
    (First, nitpick: it's 'csv' files. 'comma-separated-values'. cvs is a version control system.)

    It's probably a good idea to use the Text::CSV_XS module to parse and recombine your csv files. This easily lets you get them into convenient array form.

    Then, I'd store the arrays in a hash based on the first key. If the hash already has an array assigned to it, you can append to the existing array. (Or whatever you need to do. It's not exceedingly clear from the question. But this should be a good way to group them together.)

    At the end, you can do a foreach on the keys of the hash, and combine the fields using the module, and write that out to a new csv file.

    Hope this points you in the right direction!

    -- Kirby, WhitePages.com

      Thanks, Now I got some good ideas on how to get it done properly.
Re: How to combine multiple files together
by ikegami (Patriarch) on Oct 17, 2005 at 16:03 UTC
    How big are the other files? If you don't mind loading them into memory:
    use 5.006; # Uses "open" syntax introduced in Perl 5.6.0. use strict; use warnings; my $output_file_name = shift(@ARGV); my $main_file_name = shift(@ARGV); my @other_file_names = @ARGV; my @data; foreach (@other_file_names) { open(my $fh_in, '<', $_) or die("Unable to open input file $_: $!\n"); while (my $line = <$fh_in>) { # Keep the \n on $line. chomp(my $chomped_line = $line); my @fields = split(/,/, $chomped_line); foreach my $idx (0..$#fields) { # Saves a reference instead of a copy to save memory. push(@{$data[$idx]{$fields[$idx]}}, \$line); } } } { open(my $fh_in, '<', $main_file_name) or die("Unable to open input file $main_file_name: $!\n"); open(my $fh_out, '>', $output_file_name) or die("Unable to open output file $output_file_name: $!\n"); while (my $line = <$fh_in>) { print $fh_out ($line); chomp($line); my @fields = split(/,/, $line); foreach my $idx (0..$#fields) { if ($data[$idx]{$fields[$idx]}) { foreach (@{$data[$idx]{$fields[$idx]}}) { print $fh_out ($$_); } } } } }

    Untested. You didn't specify what to do And you should probably use Text::CSV_XS instead of splitting yourself.

    Update: Fixed compilation errors.

      Well the main file can be from 50 lines to anywhere 2000 lines.
        That's tiny. But I don't load the main file into memory, only the other ones. I asked how big are the *other* files. If they're in the same range, you should have no problem.
Re: How to combine multiple files together
by xorl (Deacon) on Oct 17, 2005 at 15:51 UTC
    if only one file has the field list, then you could just cat them all together. You don't even need perl.

    If they all do have a field list there are a number of modules you can use that deal with CSV files. I'd recommend Text::xSV

      Yes, each file has the first column of the first file, however the other files can have multiple instances of it. eg: 1st file: perl book,writer:xxxx,publisher:xxxx 2nd file: perl book,first edition,$29.99 2nd file: perl book,second edition,$25.99 3rd file: perl book,third edition,$x big file: perl book,writer:xxxx,publisher:xxxx||first edition,$29.99||second edition,$25.99||third edition,$x
Re: How to combine multiple files together
by Anonymous Monk on Oct 17, 2005 at 16:41 UTC
    I would check out the 'join' utility. If you have two files, A and B, comma-delimited, they are -sorted- (see the sort utility), and you want all lines that have the first column in common, then:
    join -j 1 -t ',' A B
    This ought to work fine! Then just iterate across all of the files, iterating as suits you best.
    Mark
      This is almost good, however I do not want repeated lines. so if file B has multiple instances of Column 1 of file A, I want it just simply printed continuously, no line breaks. However I will still have to stick to Perl, because this will be an adon to a another script.
Re: How to combine multiple files together
by CountZero (Bishop) on Oct 17, 2005 at 16:08 UTC
    To start I would concatenate all the CSV-files together (which can only be done if they all have the same fields in the same order), then with the help of a module like Text::CSV_XS extract the value of the key field for every record from this CSV-file and save these in a hash (so you have only unique keys and no duplicates).

    Next open the big CSV-file again with DBD::CSV and using an SQL statement such as "SELECT * FROM big.csv WHERE keyfield = key_value" replacing key_value by the keys in your hash, you extract the records one by one based upon the value of the keys.

    As soon as you extract a record you write it to disk with Text::CSV_XS in another file.

    If your csv-files are not too big, you could also try to read each record in each file with Text::CSV_XS and build a hash of arrays keyed by the value of your key-field and then empty the HoA back into a final CSV-file. This is probably faster but less intuitive.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://500754]
Approved by xorl
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-24 08:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found