isync has asked for the wisdom of the Perl Monks concerning the following question:

Using the CORE::stat function on Windows fails for me on filenames containingt accents or Umlauts. What am I doing wrong? Any Modules out there solving this? What's the proper way of telling perl that the filesystem returns utf-8/ cp1252 so that rename etc. work?

related?
  • Comment on stat() and utf8 filenames on Win32 fails for me, why?

Replies are listed 'Best First'.
Re: stat() and utf8 filenames on Win32 fails for me, why?
by ikegami (Patriarch) on Feb 22, 2010 at 16:15 UTC
    Windows provides two interface ("A" and "W"), and Perl uses the one where file names have to be encoded using the current (OEM) code page. That would be the same as on unix, except Windows's UTF-8 support sucks. Switching your console to the UTF-8 cp gives all kinds of problems.

    Perl does have great support for UCS-2le. That's what it uses internally. You can use the "W" interface to work with this encoding. This gives access to the full range of characters supported by Windows. It's what you have to use to create files with names that lie outside your code page.

    In short, Perl uses a unix-centric approach, so you have to go outside the box on Windows.

    Decode the file name from whatever encoding your source uses, encode it using UTF-16le, get a system handle to the file using Win32API::File's "W" functions, convert it to a Perl file using the function provided by the same module, then stat the handle.

    Update: Fleshed out details.

Re: stat() and utf8 filenames on Win32 fails for me, why?
by kennethk (Abbot) on Feb 22, 2010 at 16:02 UTC
    I cannot replicate your issue. Specifically, I placed a file named héllo.txt ("h\xE9llo.txt", set in the file explorer) in my working directory and then successfully executed the command stat "h\xE9llo.txt";. What version of Windows, Perl?

    Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\>perl -v This is perl, v5.8.9 built for MSWin32-x86-multi-thread (with 9 registered patches, see perl -V for more detail) Copyright 1987-2008, Larry Wall Binary build 825 [288577] provided by ActiveState http://www.ActiveSta +te.com Built Dec 14 2008 21:07:41

    run under XP SP3. If my test case works for you, you may find perlunitut and perluniintro illuminating.

Re: stat() and utf8 filenames on Win32 fails for me, why?
by isync (Hermit) on Feb 23, 2010 at 00:03 UTC
    Mmh, your answers tell me that I am alone with this behaviour, or there's some bug in my code. I will look into it and provide additional details soon.

      You could use inherent Windows functions;a workaround is to use the COM facilities provided by windows (in this case Scripting.FileSystemObject) which provide a much higher level of abstraction than the Win32 api calls.

      In your case you could use the GetFile method and then work with the properties of the returned object such as "DateCreated".

      Take a look at this Perlmonks node Opening files with japanese/chinese chars in filename and at msdn File Object

Re: stat() and utf8 filenames on Win32 fails for me, why?
by isync (Hermit) on Feb 24, 2010 at 18:39 UTC
    Have a look at the source of File::Glob::Windows here. I am currently digging through it but it seems to use various tricks to get the strange Windows codepages right.