Combining tab delimited tables

gogoglou has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks. I have an interesting problem, I need to combine two tab-delimited tables, that are of different dimensions. The first one has 5 columns while the second 10, so in principle I need a new table with 15 columns.Underneath are the first lines of each file

miR    BC1    BC2    BC3    BC4    BC5
hsa-mir-223 3arm    47,99932662    10,24365339    13,55247228    10,45
+37212    19,12230677
hsa-mir-21 5arm    9,382432686    14,37697968    16,24729954    11,061
+31971    14 begin_of_the_skype_highlighting              06131971 14 
+     end_of_the_skype_highlighting begin_of_the_skype_highlighting   
+           06131971 14      end_of_the_skype_highlighting begin_of_th
+e_skype_highlighting              06131971 14      end_of_the_skype_h
+ighlighting,07563561
hsa-mir-24 3arm    5,268913507    17,15846054    13,48409367    14,532
+89899    10,16886185
[download]

the second file

miR    C14C    C14P    CD4C    CD4P    CD8C    CD8P    T0H    T48H    
+TH1    TH2
hsa-mir-150 5arm    2,908940033    0,438105498    31,86213    19,53177
+981    43,48177377    59,26980003    0,23934135    0,40506398    16,2
+8900742    14,75863711
hsa-mir-223 3arm    26,49304598    15,18021615    4,083714993    9,840
+496003    1,934720532    0,568416266    30,72565058    14,62196623   
+ 0,069010679    0,039852851
[download]

What I need is to create a combined table. The first part of my program just reads the first line of each file, and then splits it and prints a new file where the columns are the combined columns of each file. so far so good. The second step is to read each valuse in the miR column from file 1 and compare it with each value in the miR column of file 2. That also works. I am though stuck in the next step because I cannot think of a way to do it. I need to print if there is no corresponding values zero's and if there are corresponding values, a combined line that has the results from both, so as in the first row there are no duplicate entries.

I don't know if I made it clear what it is I need to do, but any suggestions and ideas on how to do it would be highly appreciated

Thank you in advance for your help!

Comment on Combining tab delimited tables Select or Download Code

Replies are listed 'Best First'.
Re: Combining tab delimited tables by jethro (Monsignor) on Dec 13, 2010 at 09:57 UTC
Maybe you could post the code you already have. Along with the information how big your files are If your files are relatively small, you just read in the first file and put it into a hash. Use the first column as key and the rest of the line as value (use split() to split the line). Then read in the second file and look for the first column in the hash. Voila. Don't forget to check if the value exists (Use the exists() function). If your files are too big to fit in memory, you have two possibilities: 1) Use a hash that is stored in a file, for example with DBM::Deep 2) Sort both files on disk (if you use linux or some other unix dialect, just use the sort program). Then you can compare them line by line. Again it is useful to check if a line is missing in one file, also you have to remove the table header before you sort the files.	[reply]
Re: Combining tab delimited tables by ambrus (Abbot) on Dec 13, 2010 at 09:47 UTC
Update: blanked this node intentionally.	[reply]