chomp problem

piranhas has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: chomp problem by davidrw (Prior) on Aug 17, 2005 at 20:45 UTC
it's probably the windows newline characters in there.. try `s/[\r\n]+$//;` instead of `chomp` (you could probably change the `$/` variable instead -- see `perldoc perlvar` and `perldoc -f chomp`). side note: you can do `./test.plx < symbols.txt` instead of the useless cat. note also that this can be written as: `perl -e 's/([\r\n]+)$/ is the symbol$1/' symbols.txt` [download]	[reply] [d/l] [select]
Re^2: chomp problem by sgifford (Prior) on Aug 17, 2005 at 20:52 UTC
To elaborate a little bit on why you'd get this strange behavior, Windows encodes line endings as `CR/LF`, that is a carriage-return (return to the beginning of the line) followed by a linefeed (advance to the next line). So if you strip off just the linefeed, you end up with this in your string: `FORM<CR> is the symbol<LF>` [download] That tells your terminal to print `FORM`, then return the cursor to the beginning of the line and print `is the symbol`---which is now right on top of `FORM`---and finally advance to the next line. If you pipe this through `less`, `od`, or `hexdump`, you should be able to see this is what's going on.	[reply] [d/l] [select]
Re: chomp problem by jdhedden (Deacon) on Aug 17, 2005 at 20:49 UTC
The problem is that symbols.txt is in DOS format (CR-LF). Run: `d2u symbols.txt` [download] and then your code will work. One alternative is to replace chomp with: `s/\r+\n//;` [download] and then your code will work with both DOS and Unix formatted text files. Remember: There's always one more bug.	[reply] [d/l] [select]
Re: chomp problem by GrandFather (Saint) on Aug 17, 2005 at 20:54 UTC
Except that I used type rather than cat I get expected output with ActiveState install as detailed below: C:\...\PerlMonks>perl -v This is perl, v5.8.7 built for MSWin32-x86-multi-thread (with 7 registered patches, see perl -V for more detail) Copyright 1987-2005, Larry Wall Binary build 813 [148120] provided by ActiveState http://www.ActiveSta +te.com ActiveState is a division of Sophos. Built Jun 6 2005 13:36:37 Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using `man perl' or `perldoc perl'. If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge. C:\...\PerlMonks>type noname.txt \| perl noname.pl FORM is the symbol IDTI is the symbol ONNN is the symbol CHRT is the symbol [download] Perl is Huffman encoded by design.	[reply] [d/l]
Re: chomp problem by jira0004 (Monk) on Aug 17, 2005 at 21:12 UTC
Typically when I am reading in data from a file where there is one "record" per line. I use the replacement constructs, `=~ s/\A\s+//s;` and `=~ s/\s+\Z//s;` this strips all white space characters off the beginning and ending of the given line and avoids pesky errors caused by buffering white space characters at the start or end of a line Thus, in the case where I had the file handle `FH` open to an input file and I was populating the list `@contents` with one entry for each line in the file, I would have the following: `my @contents; my $line; while ($line = <FH>) { $line =~ s/\A\s+//s; $line =~ s/\s+\Z//s; push @contents, ( $line ); }` [download] The above code snippet will populate `@contents` with one entry for each line from the file being read in via the file handle `FH`. Each line has the white space removed from the beginning of the line via execution of the replacement statements `$line =~ s/\A\s+//s;` and `$line =~ s/\s+\Z//s;`. `\A` at the start of a pattern matches the absolute begining of a string even if the string contains vertical space characters. `\Z` at the end of a pattern matches the absolute end of a string even if the string contains vertical space charactes. `\s` matches an occurence of any white space character: `\f`, `\r`, `\n`, space and tab. Use of the `+` after `\s` causes the pattern to match one or more white space characters. The `s` qualifier at the end of the replacement statement `$line =~ s/\A\s+//s;` , `$line =~ s/\s+\Z//s;` causes the string in `$line` to be treated as one line on which to perform pattern matching even if the string contains vertiacl space characters. The approach I have suggested will work, although it is a little meticulous, and could be over kill for what you are trying to accomplish.	[reply] [d/l]
Re: chomp problem by GrandFather (Saint) on Aug 17, 2005 at 21:35 UTC
Looks to me like you have a Mac file which uses CR characters as line ends. You need either to remove the `chomp` or convert the file to Windows line ends CR/LF. Update: it is well to remember that Perl is line end savy - it does the right thing for the local operating system, which may be the wrong thing if the file is not native. Perl is Huffman encoded by design.	[reply] [d/l]
Re: chomp problem by spiritway (Vicar) on Aug 18, 2005 at 00:24 UTC
It's ironic that someone named "Piranhas" has a problem with chomp.	[reply]