in reply to Re^2: directories and charsets
in thread directories and charsets

Sorry, don't have a windows perl to hand.

I agree that the problem is your subdirectory names coming in bytes. You need to know their charset, then call Encode::decode to map them from the appropriate charset (probably utf8 or UCS-2) into perl characters.

If you hex dump the bytes and take a look on http://www.fileformat.info/info/unicode/ you should be able to work out what encoding you're getting back from readdir on the different platforms. Then do:

my $encoding = "xxx"; # Probably 'UTF-8' or 'UTF-16LE' for windows my @files = map { Encode::decode($encoding, $_) } readdir DIR;
Your scalars in @files will then be kosher perl unicode strings, and when they are concatenated with the unicode strings you are getting from your parameter file all should be well.

Good luck.

Replies are listed 'Best First'.
Re^4: directories and charsets
by soliplaya (Beadle) on Mar 15, 2007 at 21:18 UTC
    Many thanks to all, I believe I am starting to see the heavenly light.
    It is still at the end of a long tunnel because what I really want to do in the end, is reading filenames in a directory which is a few steps away :
    WWW users (presumably most on Windows workstations) drop files via drag-and-drop onto a HTTP server using DAV. The HTTP/DAV server is a Linux box. My perl script runs on a nearby Windows machine, and sees ditto Linux directories via a Samba share on the Linux machine.
    So now all I have to figure out, is in which character set these filenames really are under Linux (iow what MS Explorer and DAV do to them), how this looks through the Samba share, and how my perl script eventually sees them.
    But I will bear that chalice happily now that I can see that there is some heavenly principle behind it all.