in reply to Re^2: Strange character beginning text files
in thread Strange character beginning text files

Well, to be exact, chomp removes whatever string happens to match the current value of "$/" (input record separator), which defaults to "\015\012" for windows text-mode, "\n" for unix. (update: see replies below for correct info)

And it only does this when the string matching $/ happens to occur at the end of the scalar value being chomped.

perl -e '$/ = "\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",ord +($1))/eg; print $_,$/' # prints "str15" perl -e '$/ = "\r\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",o +rd($1))/eg; print $_,$/' # prints "str" perl -e '$/="\r\n"; $_ = "foo\015\012str\015\012"; chomp; s/(\s)/sprin +tf("%o",ord($1))/eg; print $_,$/' # prints "foo1512str"
Update: Honest, I really did (start to) post this before tachyon made it redundant. And I confess I was not speaking from personal experience (lucky me) about the default value of $/ on ms-win -- thanks to tachyon for the correction.

Replies are listed 'Best First'.
Re^4: Strange character beginning text files
by tachyon (Chancellor) on Jul 20, 2004 at 05:26 UTC

    Also for your interest your assertion that $/ is CRLF on Win32 is wrong, nor does chomp remove the \r. As I understand it there is some internal magic the means that non binmode file read/writes get converted but you can see that $/ is "\n" - at least on my system. I have been bitten by \r not getting eaten by chomp on multiple occasions, usually related to Win32->*nix issues.

    C:\>type test.pl printf "CR \\r \\%03o 0x%02x\n", ord("\r"), ord("\r"); printf "LF \\n \\%03o 0x%02x\n", ord("\n"), ord("\n"); print $^O, $/; print "\$/ length ", length $/, " is ", (unpack "H*", $/), "\n\n"; my $str = "str\015\012"; for( 1..2 ) { print "string '$str'\n"; print "length ", length $str, "\n"; chomp $str; print "string '$str'\n"; print "length ", length $str, "\n\n"; } C:\>test.pl CR \r \015 0x0d LF \n \012 0x0a MSWin32 $/ length 1 is 0a string 'str ' length 5 'tring 'str length 4 'tring 'str length 4 'tring 'str length 4 C:\>

    cheers

    tachyon

      ...bitten...

      Me too. When stripping text from an ms Word file chomp ignores \r. I have a 'trim' function for leading and trailing white space and I 'chomp' it there.

      There were also em dashes, elipsis, opening and closing single and double quotes and all the rest. As the text was being prepared for a web page HTML::Entities came to the rescue!

Re^4: Strange character beginning text files
by tachyon (Chancellor) on Jul 20, 2004 at 05:17 UTC

    Was't that what I said at the end (or did you see that post in the 30 odd seconds or so before I posted that clarification ;-)

    cheers

    tachyon