in reply to Opening files with japanese/chinese chars in filename

Given that you are using perl's readdir, and you are getting some sort of string as a result, you might want to try a little test script to show the actual byte values that are being used in the file names. Something like this would do -- and while we're at it, let's check to see if the string returned by readdir can actually be used to get information about the file and open it:
#!/usr/bin/perl use strict; use warnings; (@ARGV == 1 and -d $ARGV[0]) or die "Usage: $0 pathname\n"; my $dir = shift; opendir( DIR, $dir ) or die "$dir: $!"; while ( $_ = readdir DIR ) { next if ( /^\.\.?$/ ); print join( " ", map { sprintf( " %02x", ord($_)) } split //, $_ ) +; if ( -f ) { open( I, $_ ) or do { warn "$_: $!"; next }; my $sum = 0; $sum += length() while (<I>); close I; printf( " : %d bytes (%d read)\n", -s _, $sum ); } elsif ( -d _ ) { print " : directory\n"; } else { print " : not sure what this is\n" } }
If you don't know how to use the information that comes out of that, post a reply with a few examples of the output for non-ASCII file names (look for lines containing hex numerics greater than " 7f ").

You might also want to take a bunch of these odd-ball file names (as fetched by readdir) and concatenate them (with spaces between them) into a single long string, and pass that to the "guess_encoding" function provided by Encode::Guess -- if the characters really are non-unicode Asian or some form of unicode, there's a good chance it'll give you a correct answer, which you can then use with Encode's "decode" function, to turn the strings into perl-internal utf8 (in case that's helpful for anything).

Replies are listed 'Best First'.
Re^2: Opening files with japanese/chinese chars in filename
by Anonymous Monk on Jan 26, 2008 at 01:23 UTC
    Ok, found this this which works for readdir functionality.

    use Win32::OLE qw(in); use Encode; Win32::OLE->Option(CP => Win32::OLE::CP_UTF8); #Input: -dir to read files from #Output: -array ref with files sub ReadDirWithWin32OLE { my $dir = shift; #backslashes only in dir $dir=~s-\/-\\-g; #remove last backslash $dir=~s-\\\s*$--; if (not -e $dir) { warn "dir ($dir) does not exist"; return; } my $fso = Win32::OLE->new("Scripting.FileSystemObject"); #won't work if $dir contains unicode chars :( my $folder = $fso->GetFolder($dir); if (!$folder) { warn "Problem creating Win32::OLE (folder) object"; return; } my @filesFound = (); foreach my $file (in $folder->Files) { my $shortFilename = $file->ShortName; #my $shortFilename = $file->Name; $shortFilename = $dir . "\\" . $shortFilename; if (-e $shortFilename) { print "\nFile Found", $shortFilename; push @filesFound, $shortFilename; } else { print "\nFILE NOT FOUND!! (this should not happen):", $sho +rtFilename; } } return \@filesFound; }
    Filenames examples:
    file1_刚形变.txt
    file2_ מדהימה .txt

    BUT...:
    1)If directory path ($dir) contains weird unicode chars it won't work?!
    2)We're forced to use short filenames?! The Win32API::File trick, as mentioned by Corion, used here, didn't seem to work with "weird file2" (see above)?!
    3)Still no way to open specific files with unicode chars if drag and dropped into a perl/tk window (but that is maybe whole different topic?).

    @graff:
    readdir gives us the ? char (ascii 63) instead of *any* weird unicode char.