Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Accent file names issue

by vr (Curate)
on Sep 20, 2017 at 13:18 UTC ( [id://1199737]=note: print w/replies, xml ) Need Help??


in reply to Accent file names issue

To add to link jahero provided, there's "language for non-Unicode programs" in Control Panel UI. If your paths use only characters belonging to the "code page" chosen there (as probably case of most people), try this:

use strict; use warnings; use feature 'say'; use utf8; use Win32; use Encode qw/ encode decode /; use File::Spec::Functions; my $parent = canonpath 'c:/Users/someuser/Documents'; my $folder = 'documentação'; my $path = catdir $parent, $folder; say Win32::GetACP; # 'ANSI Code Page' say Win32::GetOEMCP; # 'OEM Code Page' say 'ok' if -d encode('CP'. Win32::GetACP, $path); say 'ok' if decode('CP'. Win32::GetOEMCP, qx(dir $parent)) =~ /$folder +/;

Decode from OEMCP, what Windows commands return ('dir', etc.), if you ever need their output.

Decode from ACP what Perl's commands ('readdir', etc.) return. And encode to ACP, as above, to reach out from Perl and Unicode to Windows and "non-Unicode programs", e.g. with file tests, file access, copying, etc.

Things get more messy if your paths use characters outside of said "code page".

If I use opendir/readdir in the "c:\users\someuser\documents" directory it will read "documentação" perfectly

No. It's not Unicode string (no utf8 flag) it returns. It's encoded in 'ANSI Code Page'. That's why "-d will work fine".

Edit: minor clarifications. + P.S. So, first you encode to ACP an utf-8 path for argument to e.g. opendir, and then decode from ACP each element of readdir's return list, to work in Perl with normal Unicode strings.

P.P.S. Oh, dir $parent must be encoded, too, if non-ASCII characters are involved. Let it be an exercise to the reader, to which 'code page' :).

Replies are listed 'Best First'.
Re^2: Accent file names issue
by ruimelo73 (Novice) on Sep 20, 2017 at 18:19 UTC

    Thank you for your reply. If you look to all these "tricks" you start thinking that perl unicode support (at least for the windows universe) is going in the wrong way. In the old days of codepages, people knew what was going on from the OS itself, perl did not have much to do with it. With all this unicode stuff going into perl string internals, people lost the control and are unable to move on with simple solutions. I have never found such annoying problem, this was not for what unicode was created for.

    Look at the pieces of code that people are publishing here... it is madness... simple scripts now have to include weird code like "utf8", "Encode", "Decode", etc (like a secret project) just to handle string variables... I understand the utf8 and other requirements posted here, but this is not the way, really... this is not the old perl glamour I once fell in love... the ç, ã and other latin languages characters are used by thousands of millions, world wide, it's a huge problem and I can't find a simple and elegant solution yet for handling file names. Future developings of perl should change radicaly this, people within latin languages countries will be fed up of perl rapidaly. Unicode handling is dificult, we all know this, but in perl is going nuts.

    Sorry if I am exagerating but I am stuck in some projects because of this ridiculous problem. I'm wasting hours of searching tricks instead of working on code.

      Hello ruimelo73,

      my warmest welcome to the monastery!!

      > it is madness... Unicode handling is dificult.. ridiculous problem..

      welcome to the post Babel Tower era!

      I'm with you: it is difficult but is the reality to be difficult not the Perl way.

      I suggest you a very informative reading: tchrist about Perl and Unicode: No magic bullet (SO)

      You must be patient and laborious to get it right; it's a narrow path but with perl it's possible.

      Many monks here are skilled at this kind of problems (not me) and you can learn a lot from them.

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      perl6 -e '$_ = "/bäçelor"; mkdir $_ or die $!; say .IO.d && .IO.e' True


      holli

      You can lead your users to water, but alas, you cannot drown them.

        In the original post I mention that passing the value of the string throught the arguments or other means similar, that do not envolve a direct setting of the string, will work. Your code...

        perl6 -e '$_ = "/bäçelor"; mkdir $_ or die $!; say .IO.d && .IO.e'

        ...will work with me also.

        The problem is within the direct setting of the string variable and then doing a -d testing. I know that it is hard to get it, but that is what I'm dealing with, not just in one computer or one version of perl, but diferent computers, and diferent perl versions, all in the Windows context.

        Again, I am sure that there is a solution for the problem, but it will require lines of code that elevate all the situation to a ridiculous level. It just characters, it should not be necessary to deal with this confusion.

        A friend of mine is joking about this saying that it will be necessary to create a Win32::PtUnicodePoo to help all portuguese perl programmers.

        Even so, thank you for your reply.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1199737]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-19 11:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found