in reply to utf8::downgrade() and file system operators
Perl compiled for Windows unfortunately uses the 8-bit variants of the filesystem functions, and (except for a recent change in Win10 described below) is generally unable to use characters outside of your 8-bit codepage, whatever that happens to be. In short, while downgrade might work for the particular character you are running into here, it won't generally work.
Windows of course has a second filesystem API that uses UTF-16, and if perl used that it would have full unicode support ... but can't really do that either because of breaking backward compatibility and that perl APIs would start behaving differently between Linux and Windows. As dasgar mentions, you can access this API using Win32::LongPath. The downside is that now your script is using a funny API and is less portable.
The "recent change in Win10" is that there is now an option in the executable properties/metadata where you can set a custom codepage on the application itself (as opposed to just the codepage of its terminal) and one of those codepages is UTF-8! Giving perl.exe a codepage of UTF-8 causes all its normal filesystem functions to suddenly just start working, because the wide characters get decomposed to UTF-8 sequences when passed to a filesystem API and now Windows understands those sequences and so it all just works. This is a very recent change to Win10 and (AFAIK) strawberry perl does not yet compile this as the default codepage for perl.exe, so you have to set it yourself.
If you'd like to become a force of positive change, the right thing to ask for is for perl porters to change the Windows build settings to set the UTF-8 codepage on the executable by default. I don't know how to do this, and there's a good chance they also don't know how to do this, so if you did the research for them or submitted a patch, it would help everyone out.
It should be noted that this change would break scripts that were using upper-ascii in the local codepage! For example, if a windows perl script wrote mkdir("\x{A9}") (Latin-1 copyright symbol) as a single byte, perl would not know that it needed encoded as UTF-8 before passing it to the mkdir() function. You would need to utf8::upgrade() or utf8::encode() it first. Or, use a unicode text editor to write the character literally in the string and declare use utf8; at the top of the file. Then, along with the patch, ask the maintainers of Strawberry to start releasing two versions of perl.exe, one with the UTF-8 codepage set, and one without.
(I'm not currently using Windows, or an active user of this new feature. I'm just relaying information I've gathered in other threads around here)
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: utf8::downgrade() and file system operators
by hexcoder (Curate) on Feb 17, 2024 at 12:45 UTC | |
by NERDVANA (Priest) on Feb 18, 2024 at 03:38 UTC |