Your OPed question has been well and truly answered. Here are some general comments on your posted code.
-
#!/usr/bin/perl -w Using -w in the command line means that warnings are enabled globally (see perlrun). This is not, in general, a good idea because there are some modules, usually older, still useful ones, that violate warnings and that may cause you (temporary) difficulty and confusion. Better IMHO to use the statements
use warnings;
use strict;
or equivalent (see Modern::Perl; see also the free book by chromatic) at the beginning of each and every script and module you write. (Update: Both warnings and strict have lexical scope when use-ed in a program or module.) (In my code examples posted on PerlMonks, I use the switch combo -wMstrict to enable these modules globally (update: not quite true; see Update 1 below). I do this only as a convenience; it's not intended as an implied recommendation for general use.)
-
my $file = "$ARGV[0]"; I see this statement occasionally, often, it seems to me, from bio-Perlish sources, and it always puzzles me. You could interpret the code as "take a string from the command line, stringize it, assign it (as a string) to a scalar, then use the scalar as a string." The extra stringization step does no harm, but it adds no value; what's the point? (This item just tweaks a random crotchet of mine.)
-
open (DAT, $file) || die("Can not open file!"); This represents an officially outmoded structure for an open statement. The "official" form for this type of statement (and this is really just my own preference) is
my $filename = 'some.file';
open my $filehandle, '<', $filename or die "opening '$filename': $!";
The file handle $filehandle is lexical, whereas DAT and its ilk are global. This | Global scope can be particularly problematic when the name used is something as generic as DAT, IN, OUT, FILE, etc.
-
while (<DAT>) {
my $seq = $_;
...
}
What's the point of (implicitly) assigning DAT to $_ and then explicitly assigning $_ to $seq? My own preference would be for something more direct, like
while (my $seq = <$filehandle>) {
do_something_with($seq);
}
The my $seq = <$filehandle> expression in the
while (my $seq = <$filehandle>) { ... }
loop condition is compiled as defined(my $seq = <$filehandle>) because reading, e.g., a '0' will evaluate as false and may improperly terminate the loop otherwise.
-
my $DNA = '';
$DNA = $seq;
This is mostly personal preference, but there seems to be no advantage to defining a variable, assigning it an initializer, then immediately assigning the final value of the variable. Further, the extra $DNA variable seems to add nothing because $seq has already been matched to and confirmed as a DNA sequence. But that's just me.
-
my $revcom = reverse($DNA);
$revcom =~ tr/ACGTacgt/TGCAtgca/;
Again personal preference. These two statements to generate the reverse complement can be written as
(my $revcom = reverse $seq) =~ tr/ACGTacgt/TGCAtgca/;
In sum, I might have written the program as follows. Again, many of the programming choices represent personal preferences; take them as you find them.
use warnings;
use strict;
my $usage = <<"EOT";
usage:
perl $0 dna_datafile.name
EOT
die $usage unless @ARGV >= 1;
my $filename = $ARGV[0];
my $rx_dna = qr{ \b [ATCGatcg]+ \b }xms;
open my $filehandle, '<', $filename or die "opening '$filename': $!";
SEQ:
while (my $seq = <$filehandle>) {
next SEQ unless my ($dna) = $seq =~ m{ \A ($rx_dna) \s* \z }xms;
(my $revcom = reverse $dna) =~ tr/ACGTacgt/TGCAtgca/;
print "dna -> reverse complement \n";
print "'$dna' -> '$revcom' \n\n";
}
close $filehandle or die "closing '$filename': $!";
exit;
# define any subroutines here ########################################
+#
(See
autodie to get rid of all the explicit
... or die "...: $!"; expressions.)
Data file dna.dat:
ACTG
catgataaatttccc
not dna
tgac
Output:
c:\@Work\Perl\monks\undergradguy>perl rev_comp_2.pl dna.dat
dna -> reverse complement
'ACTG' -> 'CAGT'
dna -> reverse complement
'catgataaatttccc' -> 'gggaaatttatcatg'
dna -> reverse complement
'tgac' -> 'gtca'
The next step: Put the reverse complement functions into a
.pm module with an accompanying
Test::More .t file. :)
Update 1: It's true that -w on the command line enables warnings globally (see $^W special variable in perlvar). However, -Mstrict on the command line still only has lexical scope, in this case the scope of the code in the -e "..." switch. Given a module Unstrict.pm
# Unstrict.pm 22dec18waw
package Unstrict;
# use strict; # module will not compile with strictures enabled
$string = bareword;
sub func { return $string; }
1;
consider
c:\@Work\Perl\monks\undergradguy>perl -Mstrict -le
"use Unstrict;
print Unstrict::func();
"
bareword
c:\@Work\Perl\monks\undergradguy>perl -Mstrict -le
"use Unstrict;
print Unstrict::func(), $x;
"
Global symbol "$x" requires explicit package name at -e line 1.
Execution of -e aborted due to compilation errors.
c:\@Work\Perl\monks\undergradguy>perl -wMstrict -le
"use Unstrict;
print Unstrict::func();
"
Unquoted string "bareword" may clash with future reserved word at Unst
+rict.pm line 7.
bareword
Give a man a fish: <%-{-{-{-<