Hello Monks, I need little help with searching or Matching contents of huge files. I got two file, first file(snp file) contains ids and a score (which is about 2,00,000 lines) and second file(map file) contains ids, genes and a score(which is 3 to 4 times larger than first file). I need to search id of first file in second and if there is a match than print that line of second file into a new file.I wrote a program to do it , by using that it take days to complete,so I need your help in optimizing it and make it reasonably fast.
My files are:(Both files are tab delimited)#snp file snp_rs log_1_pval rs3749375 11.7268615355335 rs10499549 10.4656064706897 rs7837688 9.85374546064131 rs4794737 9.41576680248523 rs10033399 9.36407447191822 rs4242382 9.22809709356544 rs4242384 8.91767075801336 rs9656816 8.61480602028324 rs982354 8.40833878650415 rs31226 8.38047936810042 ......... .........
#Map file rs10904494 NP_817124 17881 rs7837688 NP_817124 39800 rs4881551 ZMYND11 21567 rs7909028 ZMYND11 5335 rs10499549 ZMYND11 0 rs12779173 ZMYND11 0 rs2448370 ZMYND11 0 rs2448366 ZMYND11 0 rs2379078 ZMYND11 0 rs3749375 ZMYND11 0 ......... ....... .
rs3749375 ZMYND11 rs10499549 ZMYND11 rs7837688 NP_817124
# This program is for getting snps and genes open(SNP,"D:\\gsea.chi2")or die("File cant be opened"); <SNP>; while($line = <SNP>){ @snps = split(/\t/,$line); pop(@snps); foreach $id(@snps){ #print "$id \n"; search($id); } } sub search { $snpid = $_[0]; #print "$snpid \n"; open (MAP,"D:\\gsea1.SNPGENEMAP")or die("File cant be opened"); my @map = <MAP>; close (MAP); foreach $mapid(@map){ if ($mapid =~ m/^$snpid/i){ print $mapid; last; } } }
In reply to Searching Huge files by biomonk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |