Some of my comments will be based on code improvements with no speed gain, others may address your speed concern.
You're reading the file, then copying it to a new variable? And then doing nothing with the new variable? Using a variable other than $_ is good, but no point in duplicating everything around:while (<FILE1>) { chomp; my $input_orig = $_;
while (defined (my $input_line = <$input>)) { chomp $input_line;
my @line = split ' ', $input_line;
This can be done much simpler:$input{$name}[0] = $name; $input{$name}[1] = $chr; $input{$name}[2] = $pos;
(though why you need to duplicate $name, I don't know.) Of course, I just suggested turning it on its head and moritz pointed out collisions, so we get:$input{$name} = [ $name, $chr, $pos ];
Mind you, I generally prefer named values, so I'd go with:push @{$input[$chr]} => [ $name, $pos ];
push @{$input[$chr]} => { name => $name, pos => $pos };
Again, three-arg, lexical filehandle, check if open failed. However, the comment here really drives home the point I made with the previous open: put it in a variable at the top, which will make it easier for someone to change, but also make it easier to convert into a command line parameter later.open (OUT, "> test.txt"); ### Change if file name changes
### At the top: my $output_file = 'test.txt'; ### Change if the file name changes ### the open is then: open my $out, '>', $output_file or die "Can't open $output_file: $!";
This is now just a simple lookup. First, we have to convert $chr to just the number, and then see if there is one or more input entry that matches.foreach my $name(keys %input) { if ($input{$name}[1] eq $chr) { if ($input{$name}[2] > $pos-1 && $input{$name}[2] < $pos+3 +6) {
If you use the array of arrays instead of the array of hashes that I prefer, that'd be $i->[1] instead of $i->{pos}. (I think the AoA will be faster, but I'd suggest benchmarking it to see if it's significant. Readability is still nice.)$chr =~ s/^chr//; # remove chr if it's there ### or, if chr must be there: # substr $chr, 0, 3, ''; # removes first three characters if (defined $input[$chr]) { for my $i (@{$input[$chr]}) { if ($pos <= $i->{pos} && $i->{pos} <= $pos+35) {
A few points. First, this should be if/elsif/else since all the if blocks are mutually exclusive. That would cut down on the comparisons. Second, if you're using perl 5.10+, using given/when might be better since it's all the same item you're checking against, $input{$name}[2] (or now $i->{pos} or $i->[1]). Third, it looks like the last one handles all the cases, but somewhat poorly.if ($input{$name}[2] == $pos) { my $fSNP = substr($seq, 0, 1); print OUT "$fSNP\n"; } if ($input{$name}[2] == $pos+35) { my $fSNP = substr($seq, 35,1); print OUT "$fSNP\n"; } if ($input{$name}[2] != $pos+35 && $input{$name}[2] != + $pos) { my $SNP = substr($seq, 0, $input{$name}[2]-($pos-1 +)); my $fSNP = substr($SNP, $input{$name}[2]-$pos); + print OUT "$fSNP\n"; }
Untested, and I may be misreading this, but if I have read it properly, this should speed things up since you only need to copy one character once rather than a bunch of characters and then one character.my $fSNP = substr($seq, $i->{pos} - $pos, 1); print $out "$fSNP\n";
In reply to Re: How can I improve my inefficient codes?
by Tanktalus
in thread How can I improve my inefficient codes?
by FluffyBunny
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |