BeckyLynn has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I'm trying to format a code in which I'm trying to create a reverse decoy sequence. So far this is what I've come up with:

#! usr/bin/perl -w use strict; open (INPUT, "proteins.txt") or die "Couldn't open proteins.txt for re +ading: $! \n"; open (OUTPUT, ">reverse_decoy.txt") or die "Couldn't open reverse_deco +y.txt for writing: $! \n"; my @protein = <INPUT>; print OUTPUT "@protein \n"; foreach my $protein (@protein) { if ($protein =~ s/^>(.*\n)/> REVERSE DECOY $1/g) { my $header = $protein; print OUTPUT "$header"; } else { my $sequence = $protein; $sequence =~ s/\n//; my @reversed_sequence = split(/\s+/, reverse($sequence)); print OUTPUT "@reversed_sequence\n"; }} close INPUT; close OUTPUT;

My main difficulty is that even if it reverses the sequence, it only reverses each line, instead of reversing the entire file. For example: >Sequence 1 ABCDEFG HIJKLMN OUTPUT: >Reverse Decoy Sequence 1 GFEDCBA NMLKJIH Goal: >Reverse Decoy Sequence 1 NMLKJIH GFEDCBA If there's any way to switch my order of lines, I think that would help a lot! This way the lines will already be reversed, and then the order of them will be too.

Replies are listed 'Best First'.
Re: Switching the Order of Lines
by davido (Cardinal) on Oct 28, 2013 at 21:54 UTC

    You can use reverse on an array or list too, and consequently get the lines to be processed in reverse order:

    foreach my $protein ( reverse @protein ) { ...

    Or you could use File::ReadBackwards, and do something more like this:

    while( defined( my $protein = $backward_fh->readline ) ) { if( ....

    I would probably prefer the latter, because it totally removes the need to slurp the entire file all at once. Maybe that's not an issue for you, but if proteins.txt has the potential of growing really large, line-by-line processing is advantageous.

    (Be sure to read the documentation on File::ReadBackwards; I didn't demonstrate instantiating the iterator because it's pretty clear in the POD)


    Dave

Re: Switching the Order of Lines
by atcroft (Abbot) on Oct 28, 2013 at 21:51 UTC

    Welcome to the monastery. Here was a quick test snippet that appears to give results similar to those you seek:

    my $fn = shift @ARGS; @lines = (); open my $df, $fn or die $!; while ( $l = <$df> ) { chomp $l; unshift @lines, $l; } close $df; foreach my $l ( map{ $_ = reverse $_; $_; } @lines ) { print reverse $l, "\n"; }

    Hope that helps.

Re: Switching the Order of Lines
by Laurent_R (Canon) on Oct 28, 2013 at 22:53 UTC

    If you reverse the line order and the order of characters within your lines, it should do the trick (if I understood what you are looking for). This is a Perl one-liner used on one program random file sitting in my main directory:

    $ perl -e '@c = reverse <>; for (@c) {$d = reverse $_; print $d}' coderef.pl

    On my file, I get the following reesult

    ;)f$(evitatnet } ;))0 == ruoter_edoc$( && ) 1 => sevitatnet_bn$(( elihw } ;--sevitatnet_bn$ ;ruoter_edoc$ tnirp ;))(>-g$( lave = ruoter_edoc$ ym { od ;3 = sevitatnet_bn$ ym ;tfihs = g$ ym { evitatnet bus ;};"otot" tnirp{ bus = f$ ym ;sgninraw esu ;tcirts esu

    The last line above is a reverse 'use strict;' line that was at the beginning of my file.

      In order to print the whole file reversed, it is easier to slurp it into a string:

      perl -e "local $/; print scalar reverse <>" coderef.pl

        Yes, indeed, you are right. I wanted to keep the two-step approach shown by others before, but there is no real reason to do so.

Re: Switching the Order of Lines
by kcott (Archbishop) on Oct 29, 2013 at 06:10 UTC

    G'day BeckyLynn,

    Welcome to the monastery.

    I was unable to determine, from either your code or description, whether your input file contained one or many ">Sequence n" lines. The following code handles any number of such lines.

    #!/usr/bin/env perl use strict; use warnings; use autodie; my ($in_file, $out_file) = qw{temp_in.txt temp_out.txt}; open my $in_fh, '<', $in_file; open my $out_fh, '>', $out_file; my @protein; while (<$in_fh>) { chomp; if (/^>(.*)$/) { print_reverse_decoy(\@protein, $out_fh); print $out_fh ">Reverse Decoy $1\n"; } else { push @protein, $_; } } print_reverse_decoy(\@protein, $out_fh); sub print_reverse_decoy { my ($protein, $out_fh) = @_; print $out_fh "$_\n" for map { scalar reverse split '' } reverse @ +$protein; @$protein = (); return; }

    A test run with this input:

    $ cat temp_in.txt >Sequence 1 ABCDEFG HIJKLMN >Sequence 2 OPQRST UVWXYZ >Sequence N 1234567890 !@#$%^&*()

    Produces this output:

    $ cat temp_out.txt >Reverse Decoy Sequence 1 NMLKJIH GFEDCBA >Reverse Decoy Sequence 2 ZYXWVU TSRQPO >Reverse Decoy Sequence N )(*&^%$#@! 0987654321

    Notes:

    • Your shebang line looks wrong. I think you want '#!/usr/bin/perl', not '#! usr/bin/perl'.
    • See perlrun for preference of warnings pragma over -w switch.
    • Consider using the autodie pragma. It saves having to repeatedly type '... or die "Some custom error message: $!";' code. It also saves having to maintain the messages or check whether any have been omitted.
    • Use the 3-argument form of open. There are a number of benefits from doing this, including being able to pass filehandles to a subroutine (as I've done in the code I've shown here).
    • See how I've used chomp. Note that I've needed no other code to match or remove newlines with regular expressions.
    • See perlre: Modifiers for usage of the 'g' modifier.

    -- Ken