in reply to How can I improve my inefficient codes?
Some of my comments will be based on code improvements with no speed gain, others may address your speed concern.
You're reading the file, then copying it to a new variable? And then doing nothing with the new variable? Using a variable other than $_ is good, but no point in duplicating everything around:while (<FILE1>) { chomp; my $input_orig = $_;
while (defined (my $input_line = <$input>)) { chomp $input_line;
my @line = split ' ', $input_line;
This can be done much simpler:$input{$name}[0] = $name; $input{$name}[1] = $chr; $input{$name}[2] = $pos;
(though why you need to duplicate $name, I don't know.) Of course, I just suggested turning it on its head and moritz pointed out collisions, so we get:$input{$name} = [ $name, $chr, $pos ];
Mind you, I generally prefer named values, so I'd go with:push @{$input[$chr]} => [ $name, $pos ];
push @{$input[$chr]} => { name => $name, pos => $pos };
Again, three-arg, lexical filehandle, check if open failed. However, the comment here really drives home the point I made with the previous open: put it in a variable at the top, which will make it easier for someone to change, but also make it easier to convert into a command line parameter later.open (OUT, "> test.txt"); ### Change if file name changes
### At the top: my $output_file = 'test.txt'; ### Change if the file name changes ### the open is then: open my $out, '>', $output_file or die "Can't open $output_file: $!";
This is now just a simple lookup. First, we have to convert $chr to just the number, and then see if there is one or more input entry that matches.foreach my $name(keys %input) { if ($input{$name}[1] eq $chr) { if ($input{$name}[2] > $pos-1 && $input{$name}[2] < $pos+3 +6) {
If you use the array of arrays instead of the array of hashes that I prefer, that'd be $i->[1] instead of $i->{pos}. (I think the AoA will be faster, but I'd suggest benchmarking it to see if it's significant. Readability is still nice.)$chr =~ s/^chr//; # remove chr if it's there ### or, if chr must be there: # substr $chr, 0, 3, ''; # removes first three characters if (defined $input[$chr]) { for my $i (@{$input[$chr]}) { if ($pos <= $i->{pos} && $i->{pos} <= $pos+35) {
A few points. First, this should be if/elsif/else since all the if blocks are mutually exclusive. That would cut down on the comparisons. Second, if you're using perl 5.10+, using given/when might be better since it's all the same item you're checking against, $input{$name}[2] (or now $i->{pos} or $i->[1]). Third, it looks like the last one handles all the cases, but somewhat poorly.if ($input{$name}[2] == $pos) { my $fSNP = substr($seq, 0, 1); print OUT "$fSNP\n"; } if ($input{$name}[2] == $pos+35) { my $fSNP = substr($seq, 35,1); print OUT "$fSNP\n"; } if ($input{$name}[2] != $pos+35 && $input{$name}[2] != + $pos) { my $SNP = substr($seq, 0, $input{$name}[2]-($pos-1 +)); my $fSNP = substr($SNP, $input{$name}[2]-$pos); + print OUT "$fSNP\n"; }
Untested, and I may be misreading this, but if I have read it properly, this should speed things up since you only need to copy one character once rather than a bunch of characters and then one character.my $fSNP = substr($seq, $i->{pos} - $pos, 1); print $out "$fSNP\n";
|
|---|