comment on

I want to compare two files in such a way that

The code should print all the matches for individual factor from each cluster from file2, by comparing it with file1

e.g. ABC is one cluster and A,B and C are individual factors

file1

A seq1 20
B seq2 25
B seq2 80
B seq1 40
C seq1 25
D seq2 30
E seq2 45
[download]

file2

A B C
B D E
[download]

Output

A Seq1 20 B seq1 40 C seq1 25
B seq2 25 D seq2 30 E seq2 45
B seq2 80 D seq2 30 E seq2 45
[download]

so far I have tried the following code. But, it is taking so much time as my input files are huge

#file opening
open(AB,"try_fimo.txt")||die("cannot open");
open(BC,"try_fimo2.txt")||die("cannot open");

#storing file in an array
@data=<AB>;
chomp(@data);
@data2=<BC>;
chomp(@data2);

#reading file line by line
foreach $line(@data)
 {
 
  foreach $line2(@data2)
    {
    
     if($line2=~/(.*?)\s+(.*?)\s+(.*)/)
      {
     
        $t1=$1; #eg. in first row from file2 i.e.ABC, it will first ta
+ke A followed by B & C 
        $t2=$2;
        $t3=$3;
      }
     if($line=~/(.*?)\s+(.*?)\s+(.*)/)
      {
      
         if($1 eq $t1)
         
           {
           #storing each column in seperate array based on match
            push(@tf1,$1);
             push(@seq1,$2);
              push(@dis1,$3);
            # print $1,"\t",$2,"\t",$3,"\t";
             
            }
        
        
                 if($1 eq $t2)
             
              {
                      push(@tf2,$1);
                        push(@seq2,$2);
                         push(@dis2,$3);
                 
                   }
                   
                   
                            if($1 eq $t3)
                         
                    {
                                   push(@tf3,$1);
                                  push(@seq3,$2);
                                 push(@dis3,$3);       
                             
                           }
                   
             }      

                   
      }
      
      
      
  }

#comparison using loops

      for($i=0;$i<@tf1;$i++)
        {
        
          for($j=0;$j<@tf2;$j++)
             {
             
               for($k=0;$k<@tf3;$k++)
                 {
                 
                   if(($seq1[$i] eq $seq2[$j]) && ($seq1[$i] eq $seq3[
+$k]))
                      {
                      
                        if(($tf1[$i] ne $tf2[$j]) && ($tf1[$i] ne $tf3
+[$k]))
                          {
                            print $tf1[$i],"\t",$seq1[$i],"\t",$dis1[$
+i],"\t",$tf2[$j],"\t",$seq2[$j],"\t",$dis2[$j],"\t",$tf3[$k],"\t",$se
+q3[$k],"\t",$dis3[$k],"\n";
                          
                          }
                            
                         
                        }
                        
                   }
                   
                   
              }
              
              
          
         }
[download]

Can anyone please suggest a faster solution?

Thanks

In reply to how to speed up comparison between two files by greeknlatin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.