FluffyBunny has asked for the wisdom of the Perl Monks concerning the following question:
Hello perlmonks,
I wrote a basic code that can handle some text manipulation. Basically my first input (Mock3_SNP.txt) contains the exact location of the text, and I am trying to find what the text is from Mock3_forSNP.bow. the latter bowfile contains start position and 36 letters (if you understand some basic Biology, it will be chromosome number and starting position and it gives 36 bases -left to right). I succeeded with my small mock files to do the work, but with 20million lines of bowfile wouldnt do this in a few hour. My theoretical running time is actually weeks or months!! Could you guys take a look at my code and help me improve it? Thank you so much for reading!
Mock3_SNP.txt (first input)
Name Chr Position rs1 1 35251 rs2 1 35248 rs3 1 37514 rs4 1 21145 rs5 2 35150 rs6 3 11412 rs7 4 7721 rs8 21 42516 rs9 4 1110 rs10 4 1145
Mock3_forSNP.bow - second file
PGX_SOLEXA1:2:1:5:804#0/1 - chr1 35245 GGGAGGTAAAAAAAAAAAA +AAAAAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:2:4:5:804#0/1 + chr1 37510 TTTTCTTTTTTTTTTTTTT +TTTTTTTTTTTTTTTTT BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:4:5:804#0/1 - chr1 21140 CCCCCGCCCCCCCCCCCCC +CCCCCCCCCCCCCCCCC BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:4:5:774#0/1 - chr2 35128 AAAAAAAAAAAAAAAAAAA +AAATAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:8:5:474#0/1 + chr3 11410 CCGCCCCCCCCCCCCCCCC +CCCCCCCCCCCCCCCCC BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 21 "2 +4:A>N,34:T>N" PGX_SOLEXA1:4:8:5:314#0/1 + chr5 7720 AGAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24: +A>N,34:T>N" PGX_SOLEXA1:15:8:5:314#0/1 - chr21 42510 GGGGGGTCCCCCCCCCC +CCCCCCCCCCCCCCCCCCC BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 " +24:A>N,34:T>N" PGX_SOLEXA1:4:9:5:314#0/1 - chr1 35239 AAAAAAGGGAGGTAAAAAA +AAAAAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:9:21:314#0/1 + chr4 1110 GAAAAAGGGAGGTAAAAAA +AAAAAAAAAAAAAAAAG BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N"
perl script
!/usr/bin/perl # filename: Read_Bow_SNP.pl + ## Script description: This program takes input and match the chromoso +me location from Bowtie parser file, # then adds the read # to the window (also includi +ng neighboring windows) ## Perl interpreter command # use warnings; use strict; use warnings; ## Hashes initialized my %input; my %bowtieP; my $location = 0; ## Open files # file1 - input: create window # from position open (FILE1, "Mock3_SNP.txt"); my $head = <FILE1>; # If there is a header in the input #1, include th +is code while (<FILE1>) { chomp; my $input_orig = $_; my @line = split /\s+/, $_; my $name = $line[0]; my $chr = "chr$line[1]"; my $pos = $line[2]; $input{$name}[0] = $name; $input{$name}[1] = $chr; $input{$name}[2] = $pos; } close FILE1; open (OUT, "> test.txt"); ### Change if file name changes print OUT "ID\tInput chr\tInput pos\tBow ID\tBow strand\tBow chr\tBow +pos\tBow pos+35\tSeq\tSNP\n"; open (FILE2, "Mock3_forSNP.bow"); #bow file while (<FILE2>) { chomp; my @line = split /\s+/, $_; if ($line[6]==0) { my $ID = $line[0]; my $strand = $line[1]; my $chr = $line[2]; my $pos = $line[3]; my $seq = $line[4]; foreach my $name(keys %input) { if ($input{$name}[1] eq $chr) { if ($input{$name}[2] > $pos-1 && $input{$name}[2] < $pos+3 +6) { #print OUT "$input{$name}[0]\t$input{$name}[1]\t$input +{$name}[2]\t$ID\t$strand\t$chr\t$pos\t$pos+35\t$seq\n"; print OUT "$input{$name}[0]\t$input{$name}[1]\t$input{ +$name}[2]\t$ID\t$strand\t$chr\t$pos\t($pos+35)\t$seq\t"; if ($input{$name}[2] == $pos) { my $fSNP = substr($seq, 0, 1); print OUT "$fSNP\n"; } if ($input{$name}[2] == $pos+35) { my $fSNP = substr($seq, 35,1); print OUT "$fSNP\n"; } if ($input{$name}[2] != $pos+35 && $input{$name}[2] != + $pos) { my $SNP = substr($seq, 0, $input{$name}[2]-($pos-1 +)); my $fSNP = substr($SNP, $input{$name}[2]-$pos); + print OUT "$fSNP\n"; } } } } } } close FILE2; close OUT;
OMG IT'S PERL!!
~(o.o~) (~o.o)~
Perl and a blind date both require regular expression.. -_-
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: How can I improve my inefficient codes?
by moritz (Cardinal) on Nov 10, 2010 at 21:24 UTC | |
Re: How can I improve my inefficient codes?
by Tanktalus (Canon) on Nov 10, 2010 at 23:49 UTC |