comment on

The following works for me (on 5.8.1, macosx) -- it's a simple stdin->stdout filter:

#!/usr/bin/perl

use Encode; 

binmode STDIN, ":utf8";
binmode STDOUT, ":ucs-2be";

while(<>){
  print encode( "ucs-2be", $_ );
}
[download]

If the input happens to be straight ASCII (which is really just a subset of utf8 now), the resulting output is exactly twice as many bytes as the input (and every even-numbered byte offset starting at offset 0 is a null byte). Both unix and dos style line terminations are treated consistently: every byte gets converted.

For input that actually has some wide characters in it, the difference in size between input and output will vary, and each wide character will of course have a non-null high byte in the output.

It's not clear to me what's wrong with your code. (Maybe that's because I saw it before anyone added "<code>" tags, or maybe it's just that you didn't show all the relevant stuff.) Or maybe you're using 5.8.0, and ~~this might have been a problem there~~ that version might have had some trouble with handling line termination? (I'm not sure about that...)

update: I forgot about the "return trip"... this works for me too, going the other direction:

#!/usr/bin/perl

use Encode; 

binmode STDOUT, ":utf8";
binmode STDIN, ":ucs-2be";

while(<>){
  print decode( "ucs-2be", $_ );
}
[download]

I checked a dos-style ASCII file on the round-trip -- the ucs-16be version was valid, and the return from that to utf8 came out identical to the original data.

In reply to Re: ucs-2be <-> utf8 ascii by graff
in thread ucs-2be <-> utf8 ascii by germanuser

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.