Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get a Perl script to work with file names that have unicode characters, I believe UTF-8 encoding. I am testing if a file exists with the "-e" test, and the test fails although I know the file is there. I have tried to start Perl with -C, and this does not help either. Does anyone have any recommendations. This is on Windows 2000 and Windows XP, 5.8.0 version of Perl.
  • Comment on Perl support for unicode file name handling

Replies are listed 'Best First'.
Re: Perl support for unicode file name handling
by Anonymous Monk on Jan 24, 2004 at 22:23 UTC
    How? You need to show some code. I don't think perl deals with unicode filenames at all, but it's ok, cause readdir or glob won't get unicode filenames (magic of NTFS).

    Read up on Win32API::File if you want to deal with unicode filenames.

      Thanks, I have inserted some sample code
      ${^WIDE_SYSTEM_CALLS}; #use Encode; #use utf8; foreach $FILE ( `dir /B c:\\work\\intl-testout` ) #foreach $FILE ( `dir /B c:\\sigtools` ) { chomp($FILE); $FULLPATH = "c:/work/intl-testout/" . $FILE; #my $DECODE = decode("utf8",$FULLPATH); #if ( -e $DECODE ) #if ( -e $FULLPATH ) if ( -e $FULLPATH ) { print "DOS Dir listing File exists - $FULLPATH\n"; } else { print "DOS Dir listing File DOES NOT exist - $FULLPATH\n"; } } if ( ( opendir (HDIR,"c:/work/intl-testout" ) ) ) { while ( $FILE = readdir(HDIR) ) { $FULLPATH = "c:/work/intl-testout/" . $FILE; if ( -e $FULLPATH ) { print "Perl Dir listing File exists - $FULLPATH\n"; } else { print "Perl Dir listing File DOES NOT exist - $FULLPATH\n" +; } } } $FILE = q|c:/work/intl-testout/resumé.txt|; print "$FILE\n"; if ( -e $FILE ) { print "File exists\n"; } else { print "Can not find File\n"; }
      The above code lists a directory with some files with wide characters in the file name. I test if the file exists with the "-e" test, which should always return true. When I test the files with the first loop using the dos file listing, none of the files are found with -e. When I test the files with the Perl readdir, some files are found with -e, while others are not. The final test works as well. I suspect the characters returned from DOS listing are being translated to another character set in Perl, and then Windows can not resolve the file name when used in the "-e" test. I not very familiar with unicode, or Perl for that matter.

        I couldn't find any information on file globbing in Win32API::File. Nor can I find anything to explain what is meant by "magic of NTFS." Can someone explain that? Thanks.

Re: Perl support for unicode file name handling
by Anonymous Monk on Jun 09, 2017 at 11:01 UTC