gogoglou has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks. I have an interesting problem, I need to combine two tab-delimited tables, that are of different dimensions. The first one has 5 columns while the second 10, so in principle I need a new table with 15 columns.Underneath are the first lines of each file

miR BC1 BC2 BC3 BC4 BC5 hsa-mir-223 3arm 47,99932662 10,24365339 13,55247228 10,45 +37212 19,12230677 hsa-mir-21 5arm 9,382432686 14,37697968 16,24729954 11,061 +31971 14 begin_of_the_skype_highlighting              06131971 14  +     end_of_the_skype_highlighting begin_of_the_skype_highlighting    +           06131971 14      end_of_the_skype_highlighting begin_of_th +e_skype_highlighting              06131971 14      end_of_the_skype_h +ighlighting,07563561 hsa-mir-24 3arm 5,268913507 17,15846054 13,48409367 14,532 +89899 10,16886185

the second file

miR C14C C14P CD4C CD4P CD8C CD8P T0H T48H +TH1 TH2 hsa-mir-150 5arm 2,908940033 0,438105498 31,86213 19,53177 +981 43,48177377 59,26980003 0,23934135 0,40506398 16,2 +8900742 14,75863711 hsa-mir-223 3arm 26,49304598 15,18021615 4,083714993 9,840 +496003 1,934720532 0,568416266 30,72565058 14,62196623 + 0,069010679 0,039852851

What I need is to create a combined table. The first part of my program just reads the first line of each file, and then splits it and prints a new file where the columns are the combined columns of each file. so far so good. The second step is to read each valuse in the miR column from file 1 and compare it with each value in the miR column of file 2. That also works. I am though stuck in the next step because I cannot think of a way to do it. I need to print if there is no corresponding values zero's and if there are corresponding values, a combined line that has the results from both, so as in the first row there are no duplicate entries.

I don't know if I made it clear what it is I need to do, but any suggestions and ideas on how to do it would be highly appreciated

Thank you in advance for your help!

Replies are listed 'Best First'.
Re: Combining tab delimited tables
by jethro (Monsignor) on Dec 13, 2010 at 09:57 UTC

    Maybe you could post the code you already have. Along with the information how big your files are

    If your files are relatively small, you just read in the first file and put it into a hash. Use the first column as key and the rest of the line as value (use split() to split the line). Then read in the second file and look for the first column in the hash. Voila. Don't forget to check if the value exists (Use the exists() function).

    If your files are too big to fit in memory, you have two possibilities:

    1) Use a hash that is stored in a file, for example with DBM::Deep

    2) Sort both files on disk (if you use linux or some other unix dialect, just use the sort program). Then you can compare them line by line. Again it is useful to check if a line is missing in one file, also you have to remove the table header before you sort the files.

Re: Combining tab delimited tables
by ambrus (Abbot) on Dec 13, 2010 at 09:47 UTC

    Update: blanked this node intentionally.