Common between two lists

perllearner007 has asked for the wisdom of the Perl Monks concerning the following question:

Hi perlmonks,

I have two gene lists. One is 300 genes and second is 150 genes. I want to find out if those 150 genes fall in the 300 gene list or not or if some do if not all. Something like an intersection or commom genes in the two files.

I have tries this but it says, intersection=0 but I know that around 75 genes are present in the 300 gene list so why am I getting 0?

#!/usr/bin/perl -w
use strict;

#Find intersection that are the commom genes in both the list


open (FIRST, "C:/Users/ABC/Desktop/list2.txt") or die;
open (SECOND, "C:/Users/ABC/Desktop/list1.txt") or die;

my @first = (<FIRST>);
chomp (@first);

my @second = (<SECOND>);
chomp (@second);

my @union = my @isect = my @sym_diff = ();
my %union = my %isect = my %count = ();

foreach my $e (@first, @second) {
$count{$e}++;
}

foreach my $e (keys %count) {
push(@union, $e);
if ($count{$e} == 2) {
push @isect, $e;
} else {
push @sym_diff, $e;
}
}


my %seen;
my @first_only;
@seen{@second} = ();

foreach my $item (@first) {
push (@first_only, $item) unless exists $seen{$item};
}


@isect = sort (@isect);


print "Intersection: " . scalar(@isect) . " " . join (" ", @isect) . "
+\n";
[download]

Also is there any way I can get the output file in text and not the result on the terminal since this gives result as 0 on the terminal?

Thank you

Comment on Common between two lists Download Code

Replies are listed 'Best First'.
Re: Common between two lists by choroba (Cardinal) on Dec 09, 2011 at 00:57 UTC
For me, your program works correctly - if no repeated genes are present in either list. Also, make sure the whitespace in both the lists is the same.	[reply]
Re: Common between two lists by umasuresh (Hermit) on Dec 09, 2011 at 15:34 UTC
A non perl solution: intersection: `grep -w -f gene_list1 gene_list2` [download] unique in the second list for e.g: `grep -v -f gene_list1 gene_list2` [download] NOTE: If the file size are large, this may not work!	[reply] [d/l] [select]
Re^2: Common between two lists by aaron_baugher (Curate) on Dec 09, 2011 at 16:35 UTC
That may work in this case, assuming the genes are all the same length. But it would also match if a line from file1 appeared a a substring of a line in file2, so a more general non-perl solution to "I want the lines two files have in common" is to use `comm`. My guess is that using comm on two sorted files is probably less resource-intensive than asking grep to turn an entire file into search strings, too. `sort file1 >file1.sorted sort file2 >file2.sorted comm -12 file1.sorted file2.sorted >common.lines` [download] Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply] [d/l] [select]