Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Input file1: col1 col2 col3 col4 ZGLP1 ICAM4 13.27 0.2425 ICAM4 ZGLP1 13.27 0.2425 RRP1B CDH24 20.8 1 ZGLP1 OOEP 18.79 0.3060 ZGLP1 RRP1B 39.62 0.2972 ZGLP1 CDH24 51.21 0.2560 BBCDI DND1 19.44 0.2833 BBCDI SOHLH2 36.61 0.2909 DND1 SOHLH2 18 0.8
Input file2: chr8 18640000 18960000 ZGLP1 RRP1B CDH24 #gene number he +re is not fixed can be #4 #5 or more chr8 19000000 19080000 BBCDI DND1 SOHLH2 #gene number he +re is not fixed can be #4 #5 or more
I have written a code which compares col1 and col2 of file1 with each line of file2 such that, if any of the pair falls anywhere in a line of file2 then programme should print "chromosome pos1 pos2 and the matching content of the file1 with values
output file: chr8 18640000 18960000 ZGLP1 RRP1B 39.62 0.2972 chr8 18640000 18960000 ZGLP1 CDH24 51.21 0.2560 chr8 18640000 18960000 RRP1B CDH24 20.8 1 chr8 19000000 19080000 BBCDI DND1 19.44 0.2833 chr8 19000000 19080000 BBCDI SOHLH2 36.61 0.2909 chr8 19000000 19080000 DND1 SOHLH2 18 0.8
so far I have tried this but it is taking so much time as my input files are huge (2gb).
my perl code open( AB, "file1" ) || die("cannot open"); open( BC, "file2" ) || die("cannot open"); open( OUT, ">output.txt" ); @file = <AB>; chomp(@file); @data = <BC>; chomp(@data); foreach $fl (@file) { if ( $fl =~ /(.*?)\s+(.*?)\s+(.*?)\s+(.*)/ ) { $one = $1; $two = $2; $thr = $3; $for = $4; } foreach $line (@data) { if ( $line =~ /(.*?)\s+(.*?)\s+(.*?)\s+(.*)+/ ) { $chr = $1; $pos1 = $2; $pos2 = $3; } if ( $line =~ /$one/ ) { if ( $line =~ /$two/ ) { print OUT $chr, "\t", $pos1, "\t", $pos2, "\t", $fl, " +\n"; } } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: how to speed up pattern match between two files
by RichardK (Parson) on Sep 16, 2014 at 17:19 UTC | |
by gnujsa (Acolyte) on Sep 17, 2014 at 07:38 UTC | |
by RichardK (Parson) on Sep 17, 2014 at 09:15 UTC | |
by gnujsa (Acolyte) on Sep 17, 2014 at 15:58 UTC | |
by Anonymous Monk on Sep 16, 2014 at 19:25 UTC | |
by ww (Archbishop) on Sep 17, 2014 at 22:46 UTC | |
|
Re: how to speed up pattern match between two files
by ww (Archbishop) on Sep 16, 2014 at 15:45 UTC | |
|
Re: how to speed up pattern match between two files
by Lennotoecom (Pilgrim) on Sep 16, 2014 at 17:36 UTC | |
|
Re: how to speed up pattern match between two files (hash)
by tye (Sage) on Sep 17, 2014 at 23:33 UTC |