cmarra has asked for the wisdom of the Perl Monks concerning the following question:

Hi -- Disclaimer: I'm very new to Perl. Please forgive any offensive coding practices.

I'm working through a file parse. The first few lines of the file are below the "***". Note that the first 2 lines of the file have blank spaces after the visible text, whereas on the following lines there are no spaces after the last visible character. (You'll have to trust me on that; I'm not sure it'll be obvious in this post.)

This is all happening in a subroutine to which I pass in the array containing the file contents. When I shift through the array and get to the 4th line (" Year 2019") and print it, I get ("Year 201"). I've confirmed the same phenomenon further down in the file, where the last character is dropped when it's the last character on the line.

Thanks in advance,

Carol

Code is as follows:

sub read_gage_header { my ($data, $header, @headers, $ettb_no, $year); $data = shift; #Now get ettb_no $header = $data->[0]; @headers = split / /, $header; #NOTE: this print yields expected results print "HEADERS before ettb_no @headers\n" if defined ($debug); #NOTE: this gets the proper ettb_no $ettb_no = $headers[4]; #Skip to get to Year shift @$data; shift @$data; shift @$data; $header = shift @$data; #NOTE: this gives me "Year 201" print "HEADER before year $header\n" if defined ($debug); ... more code ...

****************** FILE STARTS HERE ********************************

Gage Information - 240CN - 240 FEEDER CANAL SUPPLY TO 240 FEEDER FROM BELEN HIGH LINE CANAL + Year 2019 Month Day Time Height Discharge (mst) (HP ft) (QR cfs) ----- --- ---- ------ --------- July 29 1230 5.54 80 ... more data ...

Replies are listed 'Best First'.
Re: Perl appears to be dropping last character of line
by LanX (Saint) on Dec 06, 2019 at 01:48 UTC
    The question is: What is in actually inside the array ref $data and how did you put it in there?

    Please try something like

    use Data::Dumper; print Dumper $data;

    And show us the first 10 lines.

    From your demonstration it doesn't make sense that you had to do 3 shifts to skip lines, so I'm assuming your concept of end-of-line is somehow broken.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        Good point! :)

        I mostly use Data::Dump , which is unfortunately not core, hence not easy for beginners.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: Perl appears to be dropping last character of line
by hippo (Archbishop) on Dec 06, 2019 at 09:23 UTC

    Here is an SSCCE showing that it works fine for me.

    #!/usr/bin/env perl use strict; use warnings; my $debug = 1; my @input = <DATA>; read_gage_header (\@input); sub read_gage_header { my ($data, $header, @headers, $ettb_no, $year); $data = shift; #Now get ettb_no $header = $data->[0]; @headers = split / /, $header; #NOTE: this print yields expected results print "HEADERS before ettb_no @headers\n" if defined ($debug); #NOTE: this gets the proper ettb_no $ettb_no = $headers[4]; #Skip to get to Year shift @$data; shift @$data; shift @$data; $header = shift @$data; #NOTE: this gives me "Year 201" print "HEADER before year $header\n" if defined ($debug); } __DATA__ Gage Information - 240CN - 240 FEEDER CANAL SUPPLY TO 240 FEEDER FROM BELEN HIGH LINE CANAL + Year 2019 Month Day Time Height Discharge (mst) (HP ft) (QR cfs) ----- --- ---- ------ --------- July 29 1230 5.54 80 ... more data ...

    Which produces this output when run:

    $ perl 11109714.pl HEADERS before ettb_no Gage Information - 240CN - 240 FEEDER CANAL + HEADER before year Year 2019

    In other words, the problem is with the code that you haven't shown us - the part where you read in the input and populate your $data.

Re: Perl appears to be dropping last character of line
by kcott (Archbishop) on Dec 06, 2019 at 10:21 UTC

    G'day Carol,

    Welcome to the Monastery.

    What you're describing is typically caused by embedded characters. A carriage-return ("\r") is the most usual culprit:

    $ perl -e 'my $x = "abc\r"; print "|$x|"' |abc

    Without the carriage-return ("\r") the problem of losing the final character ("|") disappears:

    $ perl -e 'my $x = "abc"; print "|$x|"' |abc|

    The problem usually arises when processing data from one OS (operating system) on a different OS. They have different line-endings: "\n" (Unix-style systems, including Linux & Mac OS X); "\r" (Mac versions earlier than Mac OS X); "\r\n" (MSWin).

    The problem may not be a carriage-return but some other control character; for instance, something like an embedded backspace could potentially cause similar problems. Use ord to identify which, if any, embedded characters may exist:

    $ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { say ord fo +r split //; }' 65 66 10 67 13 68 13 10

    Say, on a Unix-like system, you have an entire, unconverted record from an MSWin file that looks like: "abc\r\n". Using chomp (which, on a Unix-like system, will consider a newline to be the line-ending character) will only remove the terminal "\n" resulting in "abc\r" (which I used in the original example).

    $ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { chomp; say + ord for split //; }' 65 66 67 13 68 13

    In these cases, I often find it useful to remove the generic line-ending (\R) — see perlrebackslash: \R for more information.

    $ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { s/\R$//; s +ay ord for split //; }' 65 66 67 68

    Do note that you'll need Perl 5.10 to use \R. If you have an older version of Perl, you can do this:

    $ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { s/[\r\n]*$ +//; say ord for split //; }' 65 66 67 68

    Finally, as this is your first post, I won't harp on it too much but, if you provide more information, you'll generally get a better answer that doesn't involve a lot of guesswork. See "How do I post a question effectively?" and "SSCCE".

    — Ken

      Hi Everyone --

      Thanks for the responses; all of them were helpful in understanding the problem and finding the solution (removing an unneeded chop).

      Also, thanks for your patience with the lack of information; I'll do better next time!

      Carol

Re: Perl appears to be dropping last character of line
by perl-diddler (Chaplain) on Dec 06, 2019 at 02:53 UTC
    Your code doesn't really show us how you read in the data.

    I'm wondering if it was read in a while loop followed by a 'chop' (or two), when a chomp might be safer.

    Where's the routine that reads this from disk and how are the lines passed to this routine?

    Is this on unix (or linux) or on Windows or what? I doubt it is related, but different systems use different line endings...

Re: Perl appears to be dropping last character of line
by holli (Abbot) on Dec 06, 2019 at 13:25 UTC
    Crossposted on Stack Overflow, where I already told the OP the problem is in the data.


    holli

    You can lead your users to water, but alas, you cannot drown them.