comment on

In Unix, the filesystem is always just bytes, but all popular modern software is (depending on the LC environment vars) assuming those bytes can be decoded as UTF-8 and does so. In Windows, all paths are unicode, but use an 8-bit locale unless you use 16-bit wide character APIs, and Perl has always been fairly broken when using international filenames on Windows because perl uses the 8-bit APIs. It's only recently that Win10 introduced the UTF-8 Application Codepage that lets Perl see UTF-8 via those 8-bit APIs.

To the best of my knowledge, Perl only ever sees filenames as bytes and the user must handle all decoding and encoding. It results in a lot of ugly code. I wrote a whole investigative meditation about it, and looked at Python's handling of the problem for comparison. I also suggested solving it as part of a virtual filesystem module for perl.

Meanwhile, I'm a native English speaker and the only time I run into these problems are when filenames of my music collection use foreign characters, or a few cases where I was trying to make backups of client files that contain smart quotes. I can only imagine how frustrating this would be to someone with an asian language who probably uses UTF-8 for every directory and filename. Python 3 has "solved" the problem about as much as it can be solved, and I wouldn't expect to get many new perl users from asian countries if this is one of the problems they run into regularly. Or in other words, I think it ought to be a higher priority to fix this.

In reply to Re^10: Converting Unicode by NERDVANA
in thread Converting Unicode by BernieC

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.