in reply to missing character when reading input file

It now appears that your data file was created on a windows system and you are reading it with a unix-like system. Perl can handle this, but you have to tell it using IO layers in the open statement. (Refer perlio)
use strict; use warnings; BEGIN { my $win_file # memory file simulates windows file. = ">symbB.v1.2.017277.t1|scaffold1325.1|size176917|3\r\n" . "acggaccgcggcatttgccaatttgcgcgt" . "cgtcgggggtcgccatgatgtttcgcttgg" . "caggcttttttgctttggcactgctggtcg" . "cgggaaagcc\r\n" . "caagggtggcaaaggtgcaaaaggagaaca" . "agaccccttctctgagcttagccgcctcgc" . "agacaatttgaaagatgctaaagaacagcc" . "ggagaaggcc\r\n" . "aagaatgctctgaacatgatggatccagaa" . "agtttaggcgattctatggccaacatgatg" . "gtgatggcaatggataaggaccaggatggt" . "gtgttgtcag\r\n" ; $ARGV[0] = \do{$win_file}; } open( my $input, '<:crlf', $ARGV[0] ) or die( "Could not open input file $ARGV[0].\n" ); my $seq; while ( my $line = <$input> ) { chomp($line); unless ( $line =~ m/>/ ) { $line = uc($line); $seq .= $line; } } print "Length of \$seq is ", length($seq), " characters\n";
Bill

Replies are listed 'Best First'.
Re^2: [OT]: missing character when reading input file
by AnomalousMonk (Archbishop) on Sep 12, 2018 at 17:41 UTC

    I agree with your example of using open to handle a CRLFish file on a *nix system. The rest of this post is just to satisfy my curiosity.

    $ARGV[0] = \do{$win_file};

    I don't understand the purpose of munging  @ARGV in this way. Any scalar can be opened by reference as a RAM file. If the reason for initializing the scalar in a  BEGIN block was to create a lexically private scalar, then assigning a reference to it to an element of the global  @ARGV array defeats this purpose.


    Give a man a fish:  <%-{-{-{-<

      The array @ARGV was used to preserve as much of the original OP as possible. The 'BEGIN' is unnecessary, but I feel that it serves to separate the file simulation from the relevant code. No excuse for the 'do' block. It is a leftover from an earlier attempt at the file simulation.
      Bill
        The array @ARGV was used to preserve as much of the original OP as possible.

        Ah, ok. I understand better now. It occurred to me after I posted that this might have been part of the motivation.


        Give a man a fish:  <%-{-{-{-<

Re^2: missing character when reading input file
by bliako (Abbot) on Sep 12, 2018 at 09:03 UTC

    hey that's a cool new trick I learned today, setting $ARGV[0] = \do{$win_file};

    I like your method as an easy way to produce a CRLF file in non-windows machine (e.g. for tests).