biomonk has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks, I need little help with searching or Matching contents of huge files. I got two file, first file(snp file) contains ids and a score (which is about 2,00,000 lines) and second file(map file) contains ids, genes and a score(which is 3 to 4 times larger than first file). I need to search id of first file in second and if there is a match than print that line of second file into a new file.I wrote a program to do it , by using that it take days to complete,so I need your help in optimizing it and make it reasonably fast.
My files are:(Both files are tab delimited)#snp file snp_rs log_1_pval rs3749375 11.7268615355335 rs10499549 10.4656064706897 rs7837688 9.85374546064131 rs4794737 9.41576680248523 rs10033399 9.36407447191822 rs4242382 9.22809709356544 rs4242384 8.91767075801336 rs9656816 8.61480602028324 rs982354 8.40833878650415 rs31226 8.38047936810042 ......... .........
#Map file rs10904494 NP_817124 17881 rs7837688 NP_817124 39800 rs4881551 ZMYND11 21567 rs7909028 ZMYND11 5335 rs10499549 ZMYND11 0 rs12779173 ZMYND11 0 rs2448370 ZMYND11 0 rs2448366 ZMYND11 0 rs2379078 ZMYND11 0 rs3749375 ZMYND11 0 ......... ....... .
rs3749375 ZMYND11 rs10499549 ZMYND11 rs7837688 NP_817124
# This program is for getting snps and genes open(SNP,"D:\\gsea.chi2")or die("File cant be opened"); <SNP>; while($line = <SNP>){ @snps = split(/\t/,$line); pop(@snps); foreach $id(@snps){ #print "$id \n"; search($id); } } sub search { $snpid = $_[0]; #print "$snpid \n"; open (MAP,"D:\\gsea1.SNPGENEMAP")or die("File cant be opened"); my @map = <MAP>; close (MAP); foreach $mapid(@map){ if ($mapid =~ m/^$snpid/i){ print $mapid; last; } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Searching Huge files
by GrandFather (Saint) on Jul 08, 2008 at 02:08 UTC | |
by biomonk (Acolyte) on Jul 08, 2008 at 04:20 UTC | |
by GrandFather (Saint) on Jul 08, 2008 at 04:46 UTC | |
by biomonk (Acolyte) on Jul 08, 2008 at 13:06 UTC | |
by biomonk (Acolyte) on Jul 09, 2008 at 20:36 UTC | |
by GrandFather (Saint) on Jul 09, 2008 at 21:27 UTC | |
| |
by biomonk (Acolyte) on Jul 08, 2008 at 03:32 UTC | |
|
Re: Searching Huge files
by Tanktalus (Canon) on Jul 08, 2008 at 01:53 UTC | |
|
Re: Searching Huge files
by dragonchild (Archbishop) on Jul 08, 2008 at 13:37 UTC | |
by biomonk (Acolyte) on Jul 09, 2008 at 11:57 UTC | |
|
Re: Searching Huge files
by jethro (Monsignor) on Jul 08, 2008 at 02:26 UTC | |
by graff (Chancellor) on Jul 08, 2008 at 03:35 UTC |