in reply to Re: Strange character beginning text files
in thread Strange character beginning text files

Actually chomp typically eats \n only which is the line feed char LF not the carriage return char CR....

printf "CR \\r \\%03o 0x%02x\n", ord("\r"), ord("\r");; printf "LF \\n \\%03o 0x%02x\n\n", ord("\n"), ord("\n");; my $str = "str\015\012"; for( 1..2 ) { print "string '$str'\n"; print "length ", length $str, "\n"; chomp $str; print "string '$str'\n"; print "length ", length $str, "\n\n"; }

Technically chomp removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module).

cheers

tachyon

Replies are listed 'Best First'.
Re^3: Strange character beginning text files
by graff (Chancellor) on Jul 20, 2004 at 05:12 UTC
    Well, to be exact, chomp removes whatever string happens to match the current value of "$/" (input record separator), which defaults to "\015\012" for windows text-mode, "\n" for unix. (update: see replies below for correct info)

    And it only does this when the string matching $/ happens to occur at the end of the scalar value being chomped.

    perl -e '$/ = "\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",ord +($1))/eg; print $_,$/' # prints "str15" perl -e '$/ = "\r\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",o +rd($1))/eg; print $_,$/' # prints "str" perl -e '$/="\r\n"; $_ = "foo\015\012str\015\012"; chomp; s/(\s)/sprin +tf("%o",ord($1))/eg; print $_,$/' # prints "foo1512str"
    Update: Honest, I really did (start to) post this before tachyon made it redundant. And I confess I was not speaking from personal experience (lucky me) about the default value of $/ on ms-win -- thanks to tachyon for the correction.

      Also for your interest your assertion that $/ is CRLF on Win32 is wrong, nor does chomp remove the \r. As I understand it there is some internal magic the means that non binmode file read/writes get converted but you can see that $/ is "\n" - at least on my system. I have been bitten by \r not getting eaten by chomp on multiple occasions, usually related to Win32->*nix issues.

      C:\>type test.pl printf "CR \\r \\%03o 0x%02x\n", ord("\r"), ord("\r"); printf "LF \\n \\%03o 0x%02x\n", ord("\n"), ord("\n"); print $^O, $/; print "\$/ length ", length $/, " is ", (unpack "H*", $/), "\n\n"; my $str = "str\015\012"; for( 1..2 ) { print "string '$str'\n"; print "length ", length $str, "\n"; chomp $str; print "string '$str'\n"; print "length ", length $str, "\n\n"; } C:\>test.pl CR \r \015 0x0d LF \n \012 0x0a MSWin32 $/ length 1 is 0a string 'str ' length 5 'tring 'str length 4 'tring 'str length 4 'tring 'str length 4 C:\>

      cheers

      tachyon

        ...bitten...

        Me too. When stripping text from an ms Word file chomp ignores \r. I have a 'trim' function for leading and trailing white space and I 'chomp' it there.

        There were also em dashes, elipsis, opening and closing single and double quotes and all the rest. As the text was being prepared for a web page HTML::Entities came to the rescue!

      Was't that what I said at the end (or did you see that post in the 30 odd seconds or so before I posted that clarification ;-)

      cheers

      tachyon