perllearner007 has asked for the wisdom of the Perl Monks concerning the following question:


Hi perlmonks,

I have two gene lists. One is 300 genes and second is 150 genes. I want to find out if those 150 genes fall in the 300 gene list or not or if some do if not all. Something like an intersection or commom genes in the two files.

I have tries this but it says, intersection=0 but I know that around 75 genes are present in the 300 gene list so why am I getting 0?
#!/usr/bin/perl -w use strict; #Find intersection that are the commom genes in both the list open (FIRST, "C:/Users/ABC/Desktop/list2.txt") or die; open (SECOND, "C:/Users/ABC/Desktop/list1.txt") or die; my @first = (<FIRST>); chomp (@first); my @second = (<SECOND>); chomp (@second); my @union = my @isect = my @sym_diff = (); my %union = my %isect = my %count = (); foreach my $e (@first, @second) { $count{$e}++; } foreach my $e (keys %count) { push(@union, $e); if ($count{$e} == 2) { push @isect, $e; } else { push @sym_diff, $e; } } my %seen; my @first_only; @seen{@second} = (); foreach my $item (@first) { push (@first_only, $item) unless exists $seen{$item}; } @isect = sort (@isect); print "Intersection: " . scalar(@isect) . " " . join (" ", @isect) . " +\n";

Also is there any way I can get the output file in text and not the result on the terminal since this gives result as 0 on the terminal?

Thank you

Replies are listed 'Best First'.
Re: Common between two lists
by choroba (Cardinal) on Dec 09, 2011 at 00:57 UTC
    For me, your program works correctly - if no repeated genes are present in either list. Also, make sure the whitespace in both the lists is the same.
Re: Common between two lists
by umasuresh (Hermit) on Dec 09, 2011 at 15:34 UTC
    A non perl solution:
    intersection:
    grep -w -f gene_list1 gene_list2
    unique in the second list for e.g:
    grep -v -f gene_list1 gene_list2
    NOTE: If the file size are large, this may not work!

      That may work in this case, assuming the genes are all the same length. But it would also match if a line from file1 appeared a a substring of a line in file2, so a more general non-perl solution to "I want the lines two files have in common" is to use comm. My guess is that using comm on two sorted files is probably less resource-intensive than asking grep to turn an entire file into search strings, too.

      sort file1 >file1.sorted sort file2 >file2.sorted comm -12 file1.sorted file2.sorted >common.lines

      Aaron B.
      My Woefully Neglected Blog, where I occasionally mention Perl.