One of the things we usually do when writing perl code, which makes it quicker and easier and less frustrating, is to create subroutines, rather than repeating lots of lines of code.
Apart from all the overly verbose comments and excessive printed messages to the user, you have about two times more lines of actual code than you really need, because a lot of it has been written twice. You should put the parts that are repeated into a subroutine, so that they only appear once, then call the subroutine from the places where you originally put each copy of identical code.
The basic idea is called "DRY": Don't Repeat Yourself. | [reply] |
Just to give you an idea of how little effort it should take to do the thing that your script does, here's a version that:
- allows the user to specify all inputs and options at the command line, so the script knows what to do based on command-line args;
- uses a hash instead of a set of separate scalar variables;
- uses loops for input and output.
#!/usr/bin/perl
use strict; # enforce variable declarations
use warnings; # enable warnings
my $Usage = "Usage: $0 [sequence.file]\n (reads stdin if no file name
+ is given)\n";
die $Usage unless ( -p STDIN or ( @ARGV and -f $ARGV[0] ));
my %counts; # tally of letter occurrences
# read lines from named file(s) or from stdin:
while (<>) {
s/\s+$//; # remove white space (line termination) from end of lin
+e
for my $ltr ( split // ) { # split into individual characters
$counts{$ltr}++;
}
}
# print results:
for my $ltr ( sort keys %counts ) {
print " Number of $ltr nucleotides: $counts{$ltr}\n";
}
Note that this version will count any character inventory you give it; if the input happens to contain something other than ACGT, you'll get the quantity of occurrence for all the letters that occur in the data. | [reply] [d/l] |