comment on

I'm often getting confused by unicode issues -- about when a file stream is in unicode vs. not, but the latest is in a terminal interactive prog where I tried to print a unicode character and got a 'wide-char' error.

Of course I can easily work-around the problem by adding:

binmode STDOUT, ':encoding(UTF-8)';
binmode STDERR, ':encoding(UTF-8)';
[download]

to the beginning of my program, but I'm not sure why it isn't *defaulting*. to UTF-8.

I'm running from windows to linux using SecureCRT, which, in its session options, has its 'character encoding' set to UTF-8.

When I log in, if I type locale, I get:

LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[download]

That **looks** like it's saying UTF-8 for a character encoding (this is a Suse11.2 system I'm logging into, BTW, from a Win7 (i.e. unicode supporting) system).

So why is perl *defaulting* to STDOUT being non-unicode?

Why do I need the binmode?

Sorry if this is unicode-first-grade, but this stuff looks like it should be so 'simple' -- yet *blech*. I've had other issues when operating on 'internet data' where I've experienced UTF nightmares, since you don't know the character encoding of the website's response until you look at the header -- which I worked around mostly until perl worked itself into a serious coredump about 3,500,000 statements / 70,000 data lines statements into the program (to which some suggested I get to know "perl -d " ... *cough* ...I do, but not um...trying to track that down -- I just shelved the program to wait for a more reliable perl (I did, FWIW, file a bug against Perl, that has yet to be addressed that I know of).

Any idea why perl isn't just 'doing the right thing' as it is so famous for doing? Thanks...

In reply to why no default unicode? by perl-diddler

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.