Perl appears to be dropping last character of line

cmarra has asked for the wisdom of the Perl Monks concerning the following question:

Hi -- Disclaimer: I'm very new to Perl. Please forgive any offensive coding practices.

I'm working through a file parse. The first few lines of the file are below the "***". Note that the first 2 lines of the file have blank spaces after the visible text, whereas on the following lines there are no spaces after the last visible character. (You'll have to trust me on that; I'm not sure it'll be obvious in this post.)

This is all happening in a subroutine to which I pass in the array containing the file contents. When I shift through the array and get to the 4th line (" Year 2019") and print it, I get ("Year 201"). I've confirmed the same phenomenon further down in the file, where the last character is dropped when it's the last character on the line.

Thanks in advance,

Carol

Code is as follows:

sub read_gage_header {

  my ($data, $header, @headers, $ettb_no, $year);
    
  $data = shift;
  
  #Now get ettb_no
  $header = $data->[0];
  @headers = split / /, $header;

  #NOTE: this print yields expected results
  print "HEADERS before ettb_no @headers\n" if defined ($debug); 
 
  #NOTE: this gets the proper ettb_no
  $ettb_no = $headers[4];                                        

  #Skip to get to Year
  shift @$data; 
  shift @$data;
  shift @$data;  
 
  $header = shift @$data;
  #NOTE: this gives me "Year 201"
  print "HEADER before year $header\n" if defined ($debug);       

  ... more code ...
[download]

****************** FILE STARTS HERE ********************************

 Gage Information - 240CN - 240 FEEDER CANAL                        
  SUPPLY TO 240 FEEDER FROM BELEN HIGH LINE CANAL                     
+                          
                                        
  Year 2019

  Month  Day  Time   Height   Discharge
             (mst)   (HP ft)   (QR cfs)
  -----  ---  ----   ------   ---------
  July    29  1230    5.54        80

  ... more data ...
[download]

Comment on Perl appears to be dropping last character of line Select or Download Code

Replies are listed 'Best First'.
Re: Perl appears to be dropping last character of line by LanX (Saint) on Dec 06, 2019 at 01:48 UTC
The question is: What is in actually inside the array ref `$data` and how did you put it in there? Please try something like `use Data::Dumper; print Dumper $data;` [download] And show us the first 10 lines. From your demonstration it doesn't make sense that you had to do 3 shifts to skip lines, so I'm assuming your concept of end-of-line is somehow broken. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l] [select]
Re^2: Perl appears to be dropping last character of line by haukex (Archbishop) on Dec 06, 2019 at 13:21 UTC
cmarra: Adding `$Data::Dumper::Useqq=1;`, or using Data::Dump instead, will make whitespace and other special characters more visible in the debug output. See also the Basic debugging checklist.	[reply] [d/l]
Re^3: Perl appears to be dropping last character of line by LanX (Saint) on Dec 06, 2019 at 13:56 UTC
Good point! :) I mostly use Data::Dump , which is unfortunately not core, hence not easy for beginners. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]
Re: Perl appears to be dropping last character of line by hippo (Archbishop) on Dec 06, 2019 at 09:23 UTC
Here is an SSCCE showing that it works fine for me. #!/usr/bin/env perl use strict; use warnings; my $debug = 1; my @input = <DATA>; read_gage_header (\@input); sub read_gage_header { my ($data, $header, @headers, $ettb_no, $year); $data = shift; #Now get ettb_no $header = $data->[0]; @headers = split / /, $header; #NOTE: this print yields expected results print "HEADERS before ettb_no @headers\n" if defined ($debug); #NOTE: this gets the proper ettb_no $ettb_no = $headers[4]; #Skip to get to Year shift @$data; shift @$data; shift @$data; $header = shift @$data; #NOTE: this gives me "Year 201" print "HEADER before year $header\n" if defined ($debug); } __DATA__ Gage Information - 240CN - 240 FEEDER CANAL SUPPLY TO 240 FEEDER FROM BELEN HIGH LINE CANAL + Year 2019 Month Day Time Height Discharge (mst) (HP ft) (QR cfs) ----- --- ---- ------ --------- July 29 1230 5.54 80 ... more data ... [download] Which produces this output when run: `$ perl 11109714.pl HEADERS before ettb_no Gage Information - 240CN - 240 FEEDER CANAL + HEADER before year Year 2019` [download] In other words, the problem is with the code that you haven't shown us - the part where you read in the input and populate your `$data`.	[reply] [d/l] [select]
Re: Perl appears to be dropping last character of line by kcott (Archbishop) on Dec 06, 2019 at 10:21 UTC
G'day Carol, Welcome to the Monastery. What you're describing is typically caused by embedded characters. A carriage-return (`"\r"`) is the most usual culprit: `$ perl -e 'my $x = "abc\r"; print "\|$x\|"' \|abc` [download] Without the carriage-return (`"\r"`) the problem of losing the final character (`"\|"`) disappears: `$ perl -e 'my $x = "abc"; print "\|$x\|"' \|abc\|` [download] The problem usually arises when processing data from one OS (operating system) on a different OS. They have different line-endings: `"\n"` (Unix-style systems, including Linux & Mac OS X); `"\r"` (Mac versions earlier than Mac OS X); `"\r\n"` (MSWin). The problem may not be a carriage-return but some other control character; for instance, something like an embedded backspace could potentially cause similar problems. Use ord to identify which, if any, embedded characters may exist: `$ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { say ord fo +r split //; }' 65 66 10 67 13 68 13 10` [download] Say, on a Unix-like system, you have an entire, unconverted record from an MSWin file that looks like: `"abc\r\n"`. Using chomp (which, on a Unix-like system, will consider a newline to be the line-ending character) will only remove the terminal `"\n"` resulting in `"abc\r"` (which I used in the original example). `$ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { chomp; say + ord for split //; }' 65 66 67 13 68 13` [download] In these cases, I often find it useful to remove the generic line-ending (`\R`) — see perlrebackslash: \R for more information. `$ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { s/\R$//; s +ay ord for split //; }' 65 66 67 68` [download] Do note that you'll need Perl 5.10 to use `\R`. If you have an older version of Perl, you can do this: `$ perl -E 'my @x = ("A", "B\n", "C\r", "D\r\n"); for (@x) { s/[\r\n]*$ +//; say ord for split //; }' 65 66 67 68` [download] Finally, as this is your first post, I won't harp on it too much but, if you provide more information, you'll generally get a better answer that doesn't involve a lot of guesswork. See "How do I post a question effectively?" and "SSCCE". — Ken	[reply] [d/l] [select]
Re^2: Perl appears to be dropping last character of line by cmarra (Initiate) on Dec 06, 2019 at 18:34 UTC
Hi Everyone -- Thanks for the responses; all of them were helpful in understanding the problem and finding the solution (removing an unneeded chop). Also, thanks for your patience with the lack of information; I'll do better next time! Carol	[reply]
Re: Perl appears to be dropping last character of line by perl-diddler (Chaplain) on Dec 06, 2019 at 02:53 UTC
Your code doesn't really show us how you read in the data. I'm wondering if it was read in a while loop followed by a 'chop' (or two), when a chomp might be safer. Where's the routine that reads this from disk and how are the lines passed to this routine? Is this on unix (or linux) or on Windows or what? I doubt it is related, but different systems use different line endings...	[reply]
Re: Perl appears to be dropping last character of line by holli (Abbot) on Dec 06, 2019 at 13:25 UTC
Crossposted on Stack Overflow, where I already told the OP the problem is in the data. holli You can lead your users to water, but alas, you cannot drown them.	[reply] [d/l]