FIJI42 has asked for the wisdom of the Perl Monks concerning the following question:
New to Perl and was wondering if anyone could provide suggestions, relevant examples or resources regarding a coding problem I'm having below. So I have two data files with tab-delineated columns, similar to the example below.
File#1: GeneID ColA ColB Gene01 5 15 Gene02 4 8 Gene03 25 5 File#2: GeneID ColA ColC Gene01 12 3 Gene03 5 20 Gene05 22 40 Gene06 88 2
The actual files I'm using have >50 columns and rows, but are similar to what's above. First, I want to input the files, establish variables holding the column names for each file, and establish hashes using the column 1 genes as keys and the concatenated values of the other 2 columns per key. This way there is one key per one value in each row of the hash. My trouble is the third hash %commongenes. I need to find the keys that are the same in both hashes and use just those keys, and their associated values in both files, in the third hash. In the above example, this would be the following key value pairs:
File1: File2: Gene01 5 15 Gene01 12 3 Gene03 25 5 Gene03 5 20
I know the following if loop is incorrect, yet concatenation of columns from both files (similar to below) is similar in form to what I'd like to have.
if ($tmpArray1[0] eq $tmpArray2[0]){ $commongenes{$tmpArray2[0]} = $tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2 +]; }
Here is the main body of the code below:
#!/usr/bin/perl -w use strict; my $file1=$ARGV[0]; my $file2=$ARGV[1]; open (FILE1, "<$file1") or die "Cannot open $file1 for processing!\n" +; open (FILE2, "<$file2") or die "Cannot opent $file2 for processing!\n +"; my @fileLine1=<FILE1>; my @fileLine2=<FILE2>; my %file1_allgenes=(); my %file2_allgenes=(); my %commongenes =(); my ($file1_group0name, $file1_group1name, $file1_group2name)=('','',' +',''); my ($file2_group0name, $file2_group1name, $file2_group2name)=('','',' +',''); for (my $i=0; $i<=$#fileLine1 && $i<=$#fileLine2; $i++) { chomp($fileLine1[$i]); chomp($fileLine2[$i]); my @tmpArray1=split('\t',$fileLine1[$i]); my @tmpArray2=split('\t',$fileLine2[$i]); if ($i==0) { ## Column Names and/or Letters $file1_group0name=substr($tmpArray1[0],0,6); $file1_group1name=substr($tmpArray1[1],0,4); $file1_group2name=substr($tmpArray1[2],0,4); $file2_group0name=substr($tmpArray2[0],0,6); $file2_group1name=substr($tmpArray2[1],0,4); $file2_group2name=substr($tmpArray2[2],0,4); } if ($i!=0) { ## Concatenated values in 3 separate hashes + if (! defined $file1_allgenes{$tmpArray1[0]}) { $file1_allgenes{$tmpArray1[0]}=$tmpArray1[1].':'.$tmpArray1[2] +; } if (! defined $file2_allgenes{$tmpArray2[0]}) { $file2_allgenes{$tmpArray2[0]}=$tmpArray2[1].':'.$tmpArray2[2] +; } if ($tmpArray1[0] eq $tmpArray2[0]){ $commongenes{$tmpArray2[0]} = $tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2 +]; } } my @commongenes = %commongenes; print "@commongenes\n\n"; }
|
|---|