undergradguy has asked for the wisdom of the Perl Monks concerning the following question:

Hello again!

I have taken a look at the files and suggestions posted to my previous thread and here is what I made. However it returns a syntax error at line 14 and I can not see why. Does anyone have a suggestion as to why it is returning a syntax error?

#!/usr/bin/perl -w use strict; use diagnostics; my $file = "$ARGV[0]"; open (DAT, $file) || die("Can not open file!"); while (<DAT>) { my $seq = $_; if($seq =~m/^[ATCG]+$/){ my $DNA = ''; $DNA = $seq; my $revcom = reverse($DNA); $revcom =~ tr/ACGTacgt/TGCAtgca/; print $revcom; };

Replies are listed 'Best First'.
Re: Isolating DNA cont.
by toolic (Bishop) on Dec 20, 2018 at 20:41 UTC
    You are missing a closing (right) curly for the while loop, as the asymmetric indentation suggests. Here's the perltidy version (Tip #10 from the Basic debugging checklist):
    #!/usr/bin/perl -w use strict; use diagnostics; my $file = "$ARGV[0]"; open( DAT, $file ) || die("Can not open file!"); while (<DAT>) { my $seq = $_; if ( $seq =~ m/^[ATCG]+$/ ) { my $DNA = ''; $DNA = $seq; my $revcom = reverse($DNA); $revcom =~ tr/ACGTacgt/TGCAtgca/; print $revcom; } }

    I also got rid of the trailing semicolon after your right curly.

      Ah thank you very much.
Re: Isolating DNA cont.
by AnomalousMonk (Archbishop) on Dec 20, 2018 at 23:41 UTC

    Your OPed question has been well and truly answered. Here are some general comments on your posted code.

    • #!/usr/bin/perl -w Using  -w in the command line means that warnings are enabled globally (see perlrun). This is not, in general, a good idea because there are some modules, usually older, still useful ones, that violate warnings and that may cause you (temporary) difficulty and confusion. Better IMHO to use the statements
          use warnings;
          use strict;
      or equivalent (see Modern::Perl; see also the free book by chromatic) at the beginning of each and every script and module you write. (Update: Both warnings and strict have lexical scope when use-ed in a program or module.) (In my code examples posted on PerlMonks, I use the switch combo  -wMstrict to enable these modules globally (update: not quite true; see Update 1 below). I do this only as a convenience; it's not intended as an implied recommendation for general use.)
    • my $file = "$ARGV[0]"; I see this statement occasionally, often, it seems to me, from bio-Perlish sources, and it always puzzles me. You could interpret the code as "take a string from the command line, stringize it, assign it (as a string) to a scalar, then use the scalar as a string." The extra stringization step does no harm, but it adds no value; what's the point? (This item just tweaks a random crotchet of mine.)
    • open (DAT, $file) || die("Can not open file!"); This represents an officially outmoded structure for an open statement. The "official" form for this type of statement (and this is really just my own preference) is
          my $filename = 'some.file';
          open my $filehandle, '<', $filename or die "opening '$filename': $!";
      The file handle  $filehandle is lexical, whereas  DAT and its ilk are global. This | Global scope can be particularly problematic when the name used is something as generic as DAT, IN, OUT, FILE, etc.
    • while (<DAT>) {
          my $seq = $_;
          ...
          }
      What's the point of (implicitly) assigning DAT to $_ and then explicitly assigning $_ to $seq? My own preference would be for something more direct, like
          while (my $seq = <$filehandle>) {
              do_something_with($seq);
              }
      The  my $seq = <$filehandle> expression in the
          while (my $seq = <$filehandle>) { ... }
      loop condition is compiled as  defined(my $seq = <$filehandle>) because reading, e.g., a  '0' will evaluate as false and may improperly terminate the loop otherwise.
    • my $DNA = '';
      $DNA = $seq;
      This is mostly personal preference, but there seems to be no advantage to defining a variable, assigning it an initializer, then immediately assigning the final value of the variable. Further, the extra  $DNA variable seems to add nothing because  $seq has already been matched to and confirmed as a DNA sequence. But that's just me.
    • my $revcom = reverse($DNA);
      $revcom =~ tr/ACGTacgt/TGCAtgca/;
      Again personal preference. These two statements to generate the reverse complement can be written as
          (my $revcom = reverse $seq) =~ tr/ACGTacgt/TGCAtgca/;

    In sum, I might have written the program as follows. Again, many of the programming choices represent personal preferences; take them as you find them.

    use warnings; use strict; my $usage = <<"EOT"; usage: perl $0 dna_datafile.name EOT die $usage unless @ARGV >= 1; my $filename = $ARGV[0]; my $rx_dna = qr{ \b [ATCGatcg]+ \b }xms; open my $filehandle, '<', $filename or die "opening '$filename': $!"; SEQ: while (my $seq = <$filehandle>) { next SEQ unless my ($dna) = $seq =~ m{ \A ($rx_dna) \s* \z }xms; (my $revcom = reverse $dna) =~ tr/ACGTacgt/TGCAtgca/; print "dna -> reverse complement \n"; print "'$dna' -> '$revcom' \n\n"; } close $filehandle or die "closing '$filename': $!"; exit; # define any subroutines here ######################################## +#
    (See autodie to get rid of all the explicit  ... or die "...: $!"; expressions.)

    Data file dna.dat:

    ACTG catgataaatttccc not dna tgac

    Output:

    c:\@Work\Perl\monks\undergradguy>perl rev_comp_2.pl dna.dat dna -> reverse complement 'ACTG' -> 'CAGT' dna -> reverse complement 'catgataaatttccc' -> 'gggaaatttatcatg' dna -> reverse complement 'tgac' -> 'gtca'
    The next step: Put the reverse complement functions into a  .pm module with an accompanying Test::More  .t file. :)

    Update 1: It's true that  -w on the command line enables warnings globally (see  $^W special variable in perlvar). However,  -Mstrict on the command line still only has lexical scope, in this case the scope of the code in the  -e "..." switch. Given a module Unstrict.pm

    # Unstrict.pm 22dec18waw package Unstrict; # use strict; # module will not compile with strictures enabled $string = bareword; sub func { return $string; } 1;
    consider
    c:\@Work\Perl\monks\undergradguy>perl -Mstrict -le "use Unstrict; print Unstrict::func(); " bareword c:\@Work\Perl\monks\undergradguy>perl -Mstrict -le "use Unstrict; print Unstrict::func(), $x; " Global symbol "$x" requires explicit package name at -e line 1. Execution of -e aborted due to compilation errors. c:\@Work\Perl\monks\undergradguy>perl -wMstrict -le "use Unstrict; print Unstrict::func(); " Unquoted string "bareword" may clash with future reserved word at Unst +rict.pm line 7. bareword


    Give a man a fish:  <%-{-{-{-<

      Thank you for taking the time to look at my code and make suggestions. I will look at them and learn from them. I am grateful that this community exists.
Re: Isolating DNA cont.
by talexb (Chancellor) on Dec 21, 2018 at 15:51 UTC

    A general comment on developing software that works for me (and I suggest it whenever I can), is to always create balanced operations or braces, then go back and fill in the middle. So you would start with your boilerplate opening:

    #!/usr/bin/perl -w use strict; use diagnostics;
    You would have collected your input, opened and closed the file:
    my $file = $ARGV[0]; open (my $input, '<', $file) or die("Cannot open file $file: $!"); close ( $input );
    Then you would have added the skeleton of your while loop:
    my $file = $ARGV[0]; open (my $input, '<', $file) or die("Cannot open file $file: $!"); while($input>) { } close ( $input );
    Even stopping there, you'd have something that opened a file, read through each record, and closed it. You don't have to remember to finish the loop later, or think about closing the file .. you've already done that, and you can then go back and put the meat into the sandwich.

    Adding the if statement (without a condition) brings us to this:

    my $file = $ARGV[0]; open (my $input, '<', $file) or die("Cannot open file $file: $!"); while($input>) { if () { } } close ( $input );
    To me, that's way easier than writing the opening part of a brace or an operation, then later, trying to remember what I have left to go back and clean up.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      G'day Alex,

      ++ I just wanted to add my agreement with, and recommendation of, this method of coding.

      I started doing this many years ago — in fact, possibly a couple of decades ago — and it's made my (programming) life a whole lot easier.

      Nowadays, without really having to think about it, I start sections of code like:

      while (<$fh>) { }

      or

      if ($condition) { } else { }

      I then don't have to remember to add closing braces. I also don't need to worry about aligning the closing braces to get the proper indentation: auto-indent has already done that for me.

      The same works for other balanced characters. Building up a complex expression incrementally like this:

      if () { if (is_whatever()) { if (is_whatever(get_whatever())) { if (is_whatever(get_whatever(get_options()))) {

      almost never results in a "mis-matched parentheses" type error.

      Attempting to type that final expression all at once, often will; and leaves you having to scan back and forth over the expression, counting opening and closing parentheses.

      I also use this method for building up complex regular expressions (using the x or xx modifiers).

      This may not work for everyone; for example, perhaps syntax-highlighting is set up to indicate mis-match problems. However, it certainly works for me and clearly works for you. I would recommend people at least try it.

      — Ken

        Nowadays, without really having to think about it, I start sections of code like

        I literally don't have to think about it because my editor does it for me. It is worth the investment of getting to know your editor of choice so that such boilerplate can be automated away.