I want to compare two files in such a way that
The code should print all the matches for individual factor from each cluster from file2, by comparing it with file1
e.g. ABC is one cluster and A,B and C are individual factors
file1
A seq1 20 B seq2 25 B seq2 80 B seq1 40 C seq1 25 D seq2 30 E seq2 45
file2
A B C B D E
Output
A Seq1 20 B seq1 40 C seq1 25 B seq2 25 D seq2 30 E seq2 45 B seq2 80 D seq2 30 E seq2 45
so far I have tried the following code. But, it is taking so much time as my input files are huge
#file opening open(AB,"try_fimo.txt")||die("cannot open"); open(BC,"try_fimo2.txt")||die("cannot open"); #storing file in an array @data=<AB>; chomp(@data); @data2=<BC>; chomp(@data2); #reading file line by line foreach $line(@data) { foreach $line2(@data2) { if($line2=~/(.*?)\s+(.*?)\s+(.*)/) { $t1=$1; #eg. in first row from file2 i.e.ABC, it will first ta +ke A followed by B & C $t2=$2; $t3=$3; } if($line=~/(.*?)\s+(.*?)\s+(.*)/) { if($1 eq $t1) { #storing each column in seperate array based on match push(@tf1,$1); push(@seq1,$2); push(@dis1,$3); # print $1,"\t",$2,"\t",$3,"\t"; } if($1 eq $t2) { push(@tf2,$1); push(@seq2,$2); push(@dis2,$3); } if($1 eq $t3) { push(@tf3,$1); push(@seq3,$2); push(@dis3,$3); } } } } #comparison using loops for($i=0;$i<@tf1;$i++) { for($j=0;$j<@tf2;$j++) { for($k=0;$k<@tf3;$k++) { if(($seq1[$i] eq $seq2[$j]) && ($seq1[$i] eq $seq3[ +$k])) { if(($tf1[$i] ne $tf2[$j]) && ($tf1[$i] ne $tf3 +[$k])) { print $tf1[$i],"\t",$seq1[$i],"\t",$dis1[$ +i],"\t",$tf2[$j],"\t",$seq2[$j],"\t",$dis2[$j],"\t",$tf3[$k],"\t",$se +q3[$k],"\t",$dis3[$k],"\n"; } } } } }
Can anyone please suggest a faster solution?
Thanks
In reply to how to speed up comparison between two files by greeknlatin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |