in reply to Re^2: Can't Find File When Non-ASCII Letters Appear in Path
in thread Can't Find File When Non-ASCII Letters Appear in Path

Did you try UTF-16? I have some vague recollection that it's a popular encoding for Windows filesystems. Wikipedia seems to agree: UTF-16.

-sam

  • Comment on Re^3: Can't Find File When Non-ASCII Letters Appear in Path

Replies are listed 'Best First'.
Re^4: Can't Find File When Non-ASCII Letters Appear in Path
by ikegami (Patriarch) on Apr 16, 2007 at 20:01 UTC
    It's actually UCS-2le. UCS-2 is UTF-16's predecessor. While very similar, UCS-2 can only represent a subset of the characters that can be represented by UTF-16 because UCS-2 uses a fixed number of bytes for each char (2), while UTF-16 uses a variable number of bytes for each char (2, 4, ...?).

    Update: oh, I should have read your link in more carefully. It may have changed in newer version of Windows. I wonder if wcslen (strlen for wide chars) returns a number of chars of number of bytes divided by two.

Re^4: Can't Find File When Non-ASCII Letters Appear in Path
by emav (Pilgrim) on Apr 16, 2007 at 19:20 UTC
    I had never heard of that. Worst still, I have never worked with utf-16. This is my first attempt:
    #!/usr/bin/perl -w use strict; use Unicode::String qw(utf8 latin1 utf16); my $u = utf8("C:\\&#913;&#957;&#964;&#943;&#947;&#961;&#945;&#966;&#95 +9;\\VOLINFO.TXT"); my $file = $u->utf16; open FILE, $file or die "can't open $file: $!"; while (<FILE>) { } close FILE;
    I only got this incomplete reaction from activeperl: can't open
      You're not really using HTML entities, right? So what are you using? If you're trying to enter the characters in a text editor that might not work. Try using Perl escapes - like "\x080" - to construct your string.

      Also, did you try my readdir() suggestion? I still think it could help...

      -sam

        Thanks again for the reply, Sam.

        Of course, I'm not using html entities. ;-)

        Yes, I tried your readdir() suggestion but the problem is that as non-ASCII characters appear in the path, I get a similar error message (directory not found, etc).