merge files by column (revisted)

j1n3l0 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

Today at $work I came across a little problem that I assumed was trivial but seems to be not so trivial after all.

I have two files with the same number of corresponding lines (rows) and differing columns. What I wanted to do was take a column from one file and add it as a column to the second file. For example:

% cat file_one
chr coord array value
1   100   5730_1 0.123
1   200   5730_1 4.567

...

21  400   5785_3 2.345

% cat file_two
chr coord new_annotation 
1   100   1   
1   200   1   

...

21  400   60  

% cat out_file
chr coord new_annotation value
1   100   1   0.123
1   200   1   4.567

...

21  400   60  2.345
[download]

I had intended to use a unix cut command but that only takes one file as an argument. I wanted something like this post or a module that does something similar. I've looked but as yet have not found something straight forward enough.

Does this look familiar to anyone or do I have to write my own application that goes:

perl -MMy::App -e 'merge( file_one=>\@cols_one, file_two=>\@cols_two )
+;
[download]

and produces a file that combines all the selected columns from file_one and file_two?

Thanks all in advance

Smoothie, smoothie, hundre prosent naturlig!

Comment on merge files by column (revisted) Select or Download Code

Replies are listed 'Best First'.
Re: merge files by column (revisted) by pc88mxer (Vicar) on Mar 10, 2008 at 23:18 UTC
The Unix way to do it would be to use `join`. The commands `cut` and `paste` might also be helpful here. Note that the 'merge' operation you are performing is equivalent to a relational join. Another possibility it to try to use the `DBD::CSV` module (via `DBI`). Then you can perform the join using a SQL `select` statement ala `SELECT file1.*, file2.array, file2.value FROM file1, file2 WHERE file1.chr = file2.chr and file1.coord = file2.coord;` [download] With either the Unix `join` approach or the SQL approach you'll have to decide what happens when `file2` doesn't contain a row corresponding to `file1` and vice-versa.	[reply] [d/l] [select]
Re^2: merge files by column (revisted) by Anonymous Monk on Oct 05, 2011 at 22:24 UTC
"you'll have to decide what happens when file2 doesn't contain a row corresponding to file1 and vice-versa. " That has pointed out the very common issue when you merge two or more files. Do you have any clue?	[reply]
Re: merge files by column (revisted) by FunkyMonk (Bishop) on Mar 10, 2008 at 23:13 UTC
You haven't specified how the lines from the two files should match. I've presumed the first two columns should be the same: `my @file_one = ( '1 100 5730_1 0.123', '1 200 5730_1 4.567', '21 400 5785_3 2.345', ); my @file_two = ( '1 100 1', '1 200 1', '21 400 60', ); my %file2; for my $line ( @file_two ) { my @f = split /\s+/, $line; $file2{$f[0]}{$f[1]} = $f[2]; } for my $line ( @file_one ) { my @f = split /\s+/, $line; die "can't find match for $line" unless exists $file2{$f[0]}{$f[1]}; print "$f[0] $f[1] $file2{$f[0]}{$f[1]} $f[3]\n"; }` [download] Output: `1 100 1 0.123 1 200 1 4.567 21 400 60 2.345` [download]	[reply] [d/l] [select]
Re: merge files by column (revisted) by j1n3l0 (Friar) on Mar 10, 2008 at 23:32 UTC
Thanks a lot, The join looks like what I'm looking for and I will look into it. As for which columns to select by, it doesn't really matter as the two files are already sorted (by the first two columns as you noted). Thanks all. I just hope the speed will not be a problem =] Smoothie, smoothie, hundre prosent naturlig!	[reply]