piranhas has asked for the wisdom of the Perl Monks concerning the following question:

Assuming the input is a list of stock symbols with one symbol per line :

symbols.txt

FORM IDTI ONNN CHRT . .
test.plx
while (<STDIN>) { chomp; print STDERR $_ . " is the symbol\n"; } $ cat symbols.txt | ./test.plx is the symbol is the symbol is the symbol is the symbol . . while (<STDIN>) { #chomp; print STDERR $_ . " is the symbol\n"; } $ cat symbols.txt | ./test.plx FORM is the symbol IDTI is the symbol ONNN is the symbol CHRT is the symbol ADI . .
Chomp seems to affect $_ such that whenever I try to concatenate something to right of $_ (eg $_ . "somestring") the result is the right string overwrites the first characters on the left. Concatenations to the left of $_ work.
I am running :
Win XP Professional
perl, v5.8.7 built for cygwin-thread-multi-64int

Replies are listed 'Best First'.
Re: chomp problem
by davidrw (Prior) on Aug 17, 2005 at 20:45 UTC
    it's probably the windows newline characters in there.. try s/[\r\n]+$//; instead of chomp (you could probably change the $/ variable instead -- see perldoc perlvar and perldoc -f chomp).

    side note: you can do ./test.plx < symbols.txt instead of the useless cat. note also that this can be written as:
    perl -e 's/([\r\n]+)$/ is the symbol$1/' symbols.txt
      To elaborate a little bit on why you'd get this strange behavior, Windows encodes line endings as CR/LF, that is a carriage-return (return to the beginning of the line) followed by a linefeed (advance to the next line). So if you strip off just the linefeed, you end up with this in your string:
      FORM<CR> is the symbol<LF>
      That tells your terminal to print FORM, then return the cursor to the beginning of the line and print is the symbol---which is now right on top of FORM---and finally advance to the next line.

      If you pipe this through less, od, or hexdump, you should be able to see this is what's going on.

Re: chomp problem
by jdhedden (Deacon) on Aug 17, 2005 at 20:49 UTC
    The problem is that symbols.txt is in DOS format (CR-LF). Run:
    d2u symbols.txt
    and then your code will work. One alternative is to replace chomp with:
    s/\r+\n//;
    and then your code will work with both DOS and Unix formatted text files.

    Remember: There's always one more bug.
Re: chomp problem
by GrandFather (Saint) on Aug 17, 2005 at 20:54 UTC

    Except that I used type rather than cat I get expected output with ActiveState install as detailed below:

    C:\...\PerlMonks>perl -v This is perl, v5.8.7 built for MSWin32-x86-multi-thread (with 7 registered patches, see perl -V for more detail) Copyright 1987-2005, Larry Wall Binary build 813 [148120] provided by ActiveState http://www.ActiveSta +te.com ActiveState is a division of Sophos. Built Jun 6 2005 13:36:37 Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using `man perl' or `perldoc perl'. If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge. C:\...\PerlMonks>type noname.txt | perl noname.pl FORM is the symbol IDTI is the symbol ONNN is the symbol CHRT is the symbol

    Perl is Huffman encoded by design.
Re: chomp problem
by jira0004 (Monk) on Aug 17, 2005 at 21:12 UTC

    Typically when I am reading in data from a file where there is one "record" per line. I use the replacement constructs, =~ s/\A\s+//s; and =~ s/\s+\Z//s; this strips all white space characters off the beginning and ending of the given line and avoids pesky errors caused by buffering white space characters at the start or end of a line

    Thus, in the case where I had the file handle FH open to an input file and I was populating the list @contents with one entry for each line in the file, I would have the following:

    my @contents; my $line; while ($line = <FH>) { $line =~ s/\A\s+//s; $line =~ s/\s+\Z//s; push @contents, ( $line ); }

    The above code snippet will populate @contents with one entry for each line from the file being read in via the file handle FH. Each line has the white space removed from the beginning of the line via execution of the replacement statements $line =~ s/\A\s+//s; and $line =~ s/\s+\Z//s;. \A at the start of a pattern matches the absolute begining of a string even if the string contains vertical space characters. \Z at the end of a pattern matches the absolute end of a string even if the string contains vertical space charactes. \s matches an occurence of any white space character: \f, \r, \n, space and tab. Use of the + after \s causes the pattern to match one or more white space characters. The s qualifier at the end of the replacement statement $line =~ s/\A\s+//s; , $line =~ s/\s+\Z//s; causes the string in $line to be treated as one line on which to perform pattern matching even if the string contains vertiacl space characters.

    The approach I have suggested will work, although it is a little meticulous, and could be over kill for what you are trying to accomplish.

Re: chomp problem
by GrandFather (Saint) on Aug 17, 2005 at 21:35 UTC

    Looks to me like you have a Mac file which uses CR characters as line ends. You need either to remove the chomp or convert the file to Windows line ends CR/LF.

    Update: it is well to remember that Perl is line end savy - it does the right thing for the local operating system, which may be the wrong thing if the file is not native.


    Perl is Huffman encoded by design.
Re: chomp problem
by spiritway (Vicar) on Aug 18, 2005 at 00:24 UTC

    It's ironic that someone named "Piranhas" has a problem with chomp.