Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Unicode strangeness

by graff (Chancellor)
on Oct 15, 2005 at 22:14 UTC ( [id://500506]=note: print w/replies, xml ) Need Help??


in reply to Unicode strangeness

pg is right. I don't have a windows machine to try it on, but it would seem that when you use the mode spec ">:encoding(ucs2le)" in the open call, this might get appended after the default Windows ":crlf" mode.

Another way to try would be one of the following (I'm not sure which because again, I don't have a windows box to try it on):

# either this: open( my $fh, ">:encoding(ucs2le):crlf", "filename" ); # or if that doesn't work, then this: open( my $fh, ">:raw:encoding(ucs2le):crlf", "filename" );

In either case, by putting ":crlf" after the encoding spec, the crlf layer (converting "\n" in your code to "\r\n" on output) will create proper 16-bit renderings of the CR and LF characters (0d 00 0a 00).

It does seem unfortunate that this is not the default behavior.

(updated to fix spelling error in code sample)

Replies are listed 'Best First'.
Re^2: Unicode strangeness
by pg (Canon) on Oct 16, 2005 at 00:40 UTC

    Tested on Windows XP. Neither worked. However the thought is definitely very decent. I probably know where your thought came from: in the old days, :raw reverses :crlf, but it no longer does.

    use strict; use warnings; use charnames ':full'; open( my $fh, ">:raw:encoding(ucs2le):crlf", "test.plp" ); print $fh "\N{CARRIAGE RETURN}\N{LINE FEED}"; close $fh; #test open(PLP,"<","test.plp"); my $string; sysread(PLP, $string, 100); printf("0x%02x ", ord($_)) for (split //, $string);

    This prints:

    0x0d 0x00 0x0d 0x00 0x0a 0x00
    use strict; use warnings; use charnames ':full'; open( my $fh, ">:encoding(ucs2le):crlf", "test.plp" ); print $fh "\N{CARRIAGE RETURN}\N{LINE FEED}"; close $fh; #test open(PLP,"<","test.plp"); my $string; sysread(PLP, $string, 100); printf("0x%02x ", ord($_)) for (split //, $string);

    This prints:

    0x0d 0x00 0x0d 0x0a 0x00
      The whole point of ":crlf" mode is that, when you say "\n" (LINE_FEED) in your code, perl interprets that to mean "newline event", which by definition comes out as "CARRIAGE_RETURN LINE_FEED" (hence the name ":crlf" mode); when you use this mode, you would never explicitly print a "\r" (carriage return) to such a file handle, unless you really want an "extra" carriage return in the output.

      OTOH, you can leave off ":crlf", explicitly print "\r" wherever/whenever you want, and not get them added automatically when you print "\n".

      Since you seem fixated on explicitly printing the carriage returns yourself, and not having them added automatically to every line feed that you print, just leave out ":crlf".

      Based on the tests you've shown, it is essential in any case to make sure the mode begins with ":raw". Without this, the default (actually implicit) ":crlf" mode will somehow be treated in the wrong sequence relative to the ucs2le mode, and the "crlf" sequence does not get converted to a valid sequence of two 16-bit unicode characters. In terms of the code you're showing:

      ## instead of this: open( my $fh, ">:raw:encoding(ucs2le):crlf", "testa.plp" ); print $fh "\N{CARRIAGE RETURN}\N{LINE FEED}"; close $fh; ## you want either this: open( my $fh, ">:raw:encoding(ucs2le)", "testb.plp" ); print $fh "\N{CARRIAGE RETURN}\N{LINE FEED}"; close $fh; ## or this: open( my $fh, ">:raw:encoding(ucs2le):crlf", "testc.plp" ); print $fh "\N{LINE FEED}"; # :crlf adds CARRIAGE RETURN for you close $fh;
      Just for the sake of parsimony and lower probability of screwing things up, I'd prefer the last approach, personally.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://500506]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-03-29 05:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found