in reply to Unicode File Names

In my understanding the reason lies in a way OS uses those file names. Unfortunately it always re-encodes filenames on the fly, at least for Russian encoding it switches between CP866, CP1251 and unicode

In my opinion this is mostly done to be compatible with C libraries, which are console-mode based.

When not in Russian but in any other encoding, including Far East, things more complicated.

Here is how I get away of the problem using OLE interface, with elder perl:

use Win32::OLE qw(in CP_UTF8); use Win32::OLE::Const; Win32::OLE->Option(CP=>CP_UTF8); use Unicode::String qw/utf8/; my $oshell = Win32::OLE->new('Shell.Application') or die "$@"; my $f = $oshell->NameSpace(Win32::GetCwd()); print "[$f]"; my $fi = $f->Items; print $fi->Count; print "\n"; for (0 .. $fi->Count-1) { my $item = $fi->Item($_); my $name = $item->Name; my $u=utf8($name); my $s = $u->hex; $s=~s/U\+00(\w\w)/my($r,$p)=((pack 'H*',$1),$&);if($r=~m(^[()\w .;\- ++!]$)){$r}else{$p}/eg; $s=~s/(U\+[\da-f][\da-f][\da-f][\da-f])/($1)/ig; my $ren=0; $ren=1 if $s=~/U\+(?!00)/; $s=~s/[ +]//g; print "$ren|$s\n"; if($ren){$item->{Name}=$s} }
It is quite possible for you to find another solution in Win32::xxxxxx modules.

Replies are listed 'Best First'.
Re^2: Unicode File Names
by John M. Dlugosz (Monsignor) on Feb 08, 2005 at 21:48 UTC
    Probably you're running into the OEM vs. ANSI code pages. Just using Unicode completely eliminates that problem! There are two versions of the Windows API entry points: The -A form maps the 8-bit string based on current settings. The -W form takes a 16-bit string and doesn't mess with it.

    Good idea using OLE! What is "elder perl" though?

    --John

      exactly right: OEM vs ANSI. But perl's open does not refer to Unicode file naming conventions, if I am right

      "elder perl" is 5.6.1, when I wrote that script.
      Hence use Unicode::String;

        I used to. In Work Backup I found that turning on ${^WIDE_SYSTEM_CALLS} made File::Find and File::Copy work correctly even if file names contained strange characters. That was on ActiveState Perl 623, Perl 6.5.1.

        The latter ends up calling Win32::CopyFile, but the former ends up calling readdir, a built-in primitive. So I know that at least returned Unicode when getting a list of file names. I also assume it got consistant results and no not-found errors when passing such a name to stat or lstat.

        Trying that now, I see that readdir is not working. It is giving mangled results consistant with using the ANSI form of the underlying Win32 calls.

        It used to work. Now it's crippled.

        --John