Hello perlmonks,

I wrote a basic code that can handle some text manipulation. Basically my first input (Mock3_SNP.txt) contains the exact location of the text, and I am trying to find what the text is from Mock3_forSNP.bow. the latter bowfile contains start position and 36 letters (if you understand some basic Biology, it will be chromosome number and starting position and it gives 36 bases -left to right). I succeeded with my small mock files to do the work, but with 20million lines of bowfile wouldnt do this in a few hour. My theoretical running time is actually weeks or months!! Could you guys take a look at my code and help me improve it? Thank you so much for reading!

Mock3_SNP.txt (first input)

Name Chr Position rs1 1 35251 rs2 1 35248 rs3 1 37514 rs4 1 21145 rs5 2 35150 rs6 3 11412 rs7 4 7721 rs8 21 42516 rs9 4 1110 rs10 4 1145

Mock3_forSNP.bow - second file

PGX_SOLEXA1:2:1:5:804#0/1 - chr1 35245 GGGAGGTAAAAAAAAAAAA +AAAAAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:2:4:5:804#0/1 + chr1 37510 TTTTCTTTTTTTTTTTTTT +TTTTTTTTTTTTTTTTT BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:4:5:804#0/1 - chr1 21140 CCCCCGCCCCCCCCCCCCC +CCCCCCCCCCCCCCCCC BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:4:5:774#0/1 - chr2 35128 AAAAAAAAAAAAAAAAAAA +AAATAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:8:5:474#0/1 + chr3 11410 CCGCCCCCCCCCCCCCCCC +CCCCCCCCCCCCCCCCC BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 21 "2 +4:A>N,34:T>N" PGX_SOLEXA1:4:8:5:314#0/1 + chr5 7720 AGAAAAAAAAAAAAAAAAAA +AAAAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24: +A>N,34:T>N" PGX_SOLEXA1:15:8:5:314#0/1 - chr21 42510 GGGGGGTCCCCCCCCCC +CCCCCCCCCCCCCCCCCCC BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 " +24:A>N,34:T>N" PGX_SOLEXA1:4:9:5:314#0/1 - chr1 35239 AAAAAAGGGAGGTAAAAAA +AAAAAAAAAAAAAAAAA BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N" PGX_SOLEXA1:4:9:21:314#0/1 + chr4 1110 GAAAAAGGGAGGTAAAAAA +AAAAAAAAAAAAAAAAG BB^Yaa^a\a\D\aaaaa`V`a``abbaabaabaaa 0 "24 +:A>N,34:T>N"

perl script

!/usr/bin/perl # filename: Read_Bow_SNP.pl + ## Script description: This program takes input and match the chromoso +me location from Bowtie parser file, # then adds the read # to the window (also includi +ng neighboring windows) ## Perl interpreter command # use warnings; use strict; use warnings; ## Hashes initialized my %input; my %bowtieP; my $location = 0; ## Open files # file1 - input: create window # from position open (FILE1, "Mock3_SNP.txt"); my $head = <FILE1>; # If there is a header in the input #1, include th +is code while (<FILE1>) { chomp; my $input_orig = $_; my @line = split /\s+/, $_; my $name = $line[0]; my $chr = "chr$line[1]"; my $pos = $line[2]; $input{$name}[0] = $name; $input{$name}[1] = $chr; $input{$name}[2] = $pos; } close FILE1; open (OUT, "> test.txt"); ### Change if file name changes print OUT "ID\tInput chr\tInput pos\tBow ID\tBow strand\tBow chr\tBow +pos\tBow pos+35\tSeq\tSNP\n"; open (FILE2, "Mock3_forSNP.bow"); #bow file while (<FILE2>) { chomp; my @line = split /\s+/, $_; if ($line[6]==0) { my $ID = $line[0]; my $strand = $line[1]; my $chr = $line[2]; my $pos = $line[3]; my $seq = $line[4]; foreach my $name(keys %input) { if ($input{$name}[1] eq $chr) { if ($input{$name}[2] > $pos-1 && $input{$name}[2] < $pos+3 +6) { #print OUT "$input{$name}[0]\t$input{$name}[1]\t$input +{$name}[2]\t$ID\t$strand\t$chr\t$pos\t$pos+35\t$seq\n"; print OUT "$input{$name}[0]\t$input{$name}[1]\t$input{ +$name}[2]\t$ID\t$strand\t$chr\t$pos\t($pos+35)\t$seq\t"; if ($input{$name}[2] == $pos) { my $fSNP = substr($seq, 0, 1); print OUT "$fSNP\n"; } if ($input{$name}[2] == $pos+35) { my $fSNP = substr($seq, 35,1); print OUT "$fSNP\n"; } if ($input{$name}[2] != $pos+35 && $input{$name}[2] != + $pos) { my $SNP = substr($seq, 0, $input{$name}[2]-($pos-1 +)); my $fSNP = substr($SNP, $input{$name}[2]-$pos); + print OUT "$fSNP\n"; } } } } } } close FILE2; close OUT;

OMG IT'S PERL!!

~(o.o~) (~o.o)~

Perl and a blind date both require regular expression.. -_-


In reply to How can I improve my inefficient codes? by FluffyBunny

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.