I had a Win32 Perl coding challenge to search a directory(WinXP, NTFS) that may contain files with unicode / wide filenames in addition to ANSI filenames.
Through trial, error, and searches against Perl mailing list archives, it seems apparent that Win32 Perl's builtin directory functions do not support return Win32 unicode / wide filenames. More specifically, the builtin
functions return filenames like "??????_HostID_2006-01-19_213218.xls" when the filename contains unicode / wide characters (same behavior as the DOS "dir" command). The problem is that you can't pass these filenames to something like stat or via OLE to ask Excel to open it.
After much trial and error, I found a solution using Win32::API, the FindFirstFileW, FindNextFileW, and FindClose Win32 API calls -- with an "unpack" workaround suggested by $Bill Luebkert.
use strict;
use Win32::API;
use Unicode::String;
use English;
$OUTPUT_AUTOFLUSH=1;
$Win32::API::DEBUG = 0;
binmode(STDOUT, ":utf8");
use constant ERROR_NO_MORE_FILES => 18;
use constant INVALID_HANDLE_VALUE => -1;
Win32::API::Struct-> typedef('FILETIME', qw(
DWORD dwLowDateTime;
DWORD dwHighDateTime;
)); # 8 bytes
use constant FILE_ATTRIBUTE_READONLY => 0x00000001;
use constant FILE_ATTRIBUTE_HIDDEN => 0x00000002;
use constant FILE_ATTRIBUTE_SYSTEM => 0x00000004;
use constant FILE_ATTRIBUTE_DIRECTORY => 0x00000010;
use constant FILE_ATTRIBUTE_ARCHIVE => 0x00000020;
use constant FILE_ATTRIBUTE_NORMAL => 0x00000080;
use constant FILE_ATTRIBUTE_TEMPORARY => 0x00000100;
use constant FILE_ATTRIBUTE_COMPRESSED => 0x00000800;
use constant MAX_PATH => 260;
Win32::API::Struct-> typedef('WIN32_FIND_DATAW', qw(
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
DWORD nFileSizeHigh;
DWORD nFileSizeLow;
DWORD dwReserved0;
DWORD dwReserved1;
WCHAR cFileName[520];
WCHAR cAlternateFileName[28];
)); # 4 + 8 x 3 + 4 x 4 + 520 + 28 = 592 bytes
# Note: Win32::API::Struct->Typedef should seemingly
# allocate 2 bytes per WCHAR, the sizeof('WCHAR'), but it
# actually allocates only 1 byte -- same as TCHAR
my $FindFirstFile = Win32::API->new('kernel32.dll', 'FindFirstFileW',
+'PS', 'N') or die "FindFirstFile: $^E";
my $FindNextFile = Win32::API->new('kernel32.dll', 'FindNextFileW', '
+NS', 'I') or die "FindNextFile $^E";
my $FindClose = Win32::API->new('kernel32.dll', 'FindClose', 'N',
+'I') or die "FileClose $^E";
# set your own value here...
my $FileSpec = "//?/C:/My Documents/Tool/*.xls";
my $FileInfo = Win32::API::Struct-> new('WIN32_FIND_DATAW');
my $uFileSpec = Unicode::String->new;
$uFileSpec->utf8($FileSpec);
print "FileSpec = ", $uFileSpec->as_string, "\n";
my $handle = $FindFirstFile-> Call($uFileSpec->utf16le, $FileInfo);
if ($handle == INVALID_HANDLE_VALUE) {
printf "Error is %d - %s\n", Win32::GetLastError (),
Win32::FormatMessage (Win32::GetLastError ());
exit(1);
} else {
print "FindFirstFile worked\n";
my $count = 1;
my $ufn = Unicode::String->new;
my $ualtfn = Unicode::String->new;
# to get the filename in unicode UTF-16LE format, must
# unpack the $FileInfo 'buffer' hash element because
# Win32 API doesn't recognize the cFileName hash
# element as an UTF-16LE format character string
# It treats it as an ASCII Z / null terminated string
my ($cFileName, $cAlternateFileName) = unpack( "x44A520A28", $File
+Info->{buffer} );
# if length is odd, pad filenames with null byte as utf16le char s
+tring byte length must be even
# occurs when last UTF-16LE char of filename is ANSI (e.g., "S") a
+nd unpack A template strips off following null byte
# e.g., "S\0"
if (length($cFileName) & 1) {
$cFileName .= "\x00";
}
if (length($cAlternateFileName) & 1) {
$cAlternateFileName .= "\x00";
}
$ufn->utf16le($cFileName);
$ualtfn->utf16le($cAlternateFileName);
print "($count) filename = ", $ufn->as_string, "\n";
print "\talt = ", $ualtfn->as_string, "\n";
while (my $result = $FindNextFile->Call($handle,$FileInfo)) {
$count++;
($cFileName, $cAlternateFileName) = unpack( "x44A520A28", $Fil
+eInfo->{buffer} );
if (length($cFileName) & 1) {
$cFileName .= "\x00";
}
if (length($cAlternateFileName) & 1) {
$cAlternateFileName .= "\x00";
}
$ufn->utf16le($cFileName);
$ualtfn->utf16le($cAlternateFileName);
print "($count) filename = ", $ufn->as_string, "\n";
print "\talt = ", $ualtfn->as_string, "\n";
}
}
$FindClose->Call($handle) or die "FindClose $^E";
exit(0);
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.