j1n3l0 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

Today at $work I came across a little problem that I assumed was trivial but seems to be not so trivial after all.

I have two files with the same number of corresponding lines (rows) and differing columns. What I wanted to do was take a column from one file and add it as a column to the second file. For example:

% cat file_one chr coord array value 1 100 5730_1 0.123 1 200 5730_1 4.567 ... 21 400 5785_3 2.345 % cat file_two chr coord new_annotation 1 100 1 1 200 1 ... 21 400 60 % cat out_file chr coord new_annotation value 1 100 1 0.123 1 200 1 4.567 ... 21 400 60 2.345

I had intended to use a unix cut command but that only takes one file as an argument. I wanted something like this post or a module that does something similar. I've looked but as yet have not found something straight forward enough.

Does this look familiar to anyone or do I have to write my own application that goes:

perl -MMy::App -e 'merge( file_one=>\@cols_one, file_two=>\@cols_two ) +;

and produces a file that combines all the selected columns from file_one and file_two?

Thanks all in advance


Smoothie, smoothie, hundre prosent naturlig!

Replies are listed 'Best First'.
Re: merge files by column (revisted)
by pc88mxer (Vicar) on Mar 10, 2008 at 23:18 UTC
    The Unix way to do it would be to use join. The commands cut and paste might also be helpful here. Note that the 'merge' operation you are performing is equivalent to a relational join.

    Another possibility it to try to use the DBD::CSV module (via DBI). Then you can perform the join using a SQL select statement ala

    SELECT file1.*, file2.array, file2.value FROM file1, file2 WHERE file1.chr = file2.chr and file1.coord = file2.coord;

    With either the Unix join approach or the SQL approach you'll have to decide what happens when file2 doesn't contain a row corresponding to file1 and vice-versa.

      "you'll have to decide what happens when file2 doesn't contain a row corresponding to file1 and vice-versa. " That has pointed out the very common issue when you merge two or more files. Do you have any clue?
Re: merge files by column (revisted)
by FunkyMonk (Bishop) on Mar 10, 2008 at 23:13 UTC
    You haven't specified how the lines from the two files should match. I've presumed the first two columns should be the same:
    my @file_one = ( '1 100 5730_1 0.123', '1 200 5730_1 4.567', '21 400 5785_3 2.345', ); my @file_two = ( '1 100 1', '1 200 1', '21 400 60', ); my %file2; for my $line ( @file_two ) { my @f = split /\s+/, $line; $file2{$f[0]}{$f[1]} = $f[2]; } for my $line ( @file_one ) { my @f = split /\s+/, $line; die "can't find match for $line" unless exists $file2{$f[0]}{$f[1]}; print "$f[0] $f[1] $file2{$f[0]}{$f[1]} $f[3]\n"; }

    Output:

    1 100 1 0.123 1 200 1 4.567 21 400 60 2.345

Re: merge files by column (revisted)
by j1n3l0 (Friar) on Mar 10, 2008 at 23:32 UTC
    Thanks a lot,

    The join looks like what I'm looking for and I will look into it. As for which columns to select by, it doesn't really matter as the two files are already sorted (by the first two columns as you noted).

    Thanks all. I just hope the speed will not be a problem =]


    Smoothie, smoothie, hundre prosent naturlig!