If the input happens to be straight ASCII (which is really just a subset of utf8 now), the resulting output is exactly twice as many bytes as the input (and every even-numbered byte offset starting at offset 0 is a null byte). Both unix and dos style line terminations are treated consistently: every byte gets converted.#!/usr/bin/perl use Encode; binmode STDIN, ":utf8"; binmode STDOUT, ":ucs-2be"; while(<>){ print encode( "ucs-2be", $_ ); }
For input that actually has some wide characters in it, the difference in size between input and output will vary, and each wide character will of course have a non-null high byte in the output.
It's not clear to me what's wrong with your code. (Maybe that's because I saw it before anyone added "<code>" tags, or maybe it's just that you didn't show all the relevant stuff.) Or maybe you're using 5.8.0, and this might have been a problem there that version might have had some trouble with handling line termination? (I'm not sure about that...)
update: I forgot about the "return trip"... this works for me too, going the other direction:
I checked a dos-style ASCII file on the round-trip -- the ucs-16be version was valid, and the return from that to utf8 came out identical to the original data.#!/usr/bin/perl use Encode; binmode STDOUT, ":utf8"; binmode STDIN, ":ucs-2be"; while(<>){ print decode( "ucs-2be", $_ ); }
In reply to Re: ucs-2be <-> utf8 ascii
by graff
in thread ucs-2be <-> utf8 ascii
by germanuser
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |