in reply to Re^5: how are ARGV and filename strings represented?
in thread how are ARGV and filename strings represented?
So now I'm starting to lean to the position that one should decode ARGV
Yes, but that means you might not be able to access/create some files.
Files in unix systems are arbitrary sequences of bytes, so the file name might not be decodable. Imagine two processes/users/machines using different locales accessing the same volume. That said, the de facto standardization towards UTF-8 makes problems an exception rather than the rule.
Files in Windows are encoded using UTF-16le, but Perl uses a different encoding to talk to the OS. For example, Perl uses Windows-1252 on my machine, so using @ARGV will invariably limit me to files that can be encoded using Window-1252.
In Windows, you can use CommandLineToArgvW instead of @ARGV, plus Win32::LongPath to work with any path, and you can decode them without issue.
In unix, you can work with any path by making sure to provided downgraded strings, but there's no simple solution to decoding them without loss.
One way of accessing CommandLineToArgvW (from here):
use strict; use warnings; use feature qw( say state ); use open ':std', ':encoding('.do { require Win32; "cp".Win32::GetConso +leOutputCP() }.')'; use Config qw( %Config ); use Encode qw( decode encode ); use Win32::API qw( ReadMemory ); use constant PTR_SIZE => $Config{ptrsize}; use constant PTR_PACK_FORMAT => PTR_SIZE == 8 ? 'Q' : PTR_SIZE == 4 ? 'L' : die("Unrecognized ptrsize\n"); use constant PTR_WIN32API_TYPE => PTR_SIZE == 8 ? 'Q' : PTR_SIZE == 4 ? 'N' : die("Unrecognized ptrsize\n"); sub lstrlenW { my ($ptr) = @_; state $lstrlenW = Win32::API->new('kernel32', 'lstrlenW', PTR_WIN32 +API_TYPE, 'i') or die($^E); return $lstrlenW->Call($ptr); } sub decode_LPCWSTR { my ($ptr) = @_; return undef if !$ptr; my $num_chars = lstrlenW($ptr) or return ''; return decode('UTF-16le', ReadMemory($ptr, $num_chars * 2)); } # Returns true on success. Returns false and sets $^E on error. sub LocalFree { my ($ptr) = @_; state $LocalFree = Win32::API->new('kernel32', 'LocalFree', PTR_WIN +32API_TYPE, PTR_WIN32API_TYPE) or die($^E); return $LocalFree->Call($ptr) == 0; } sub GetCommandLine { state $GetCommandLine = Win32::API->new('kernel32', 'GetCommandLine +W', '', PTR_WIN32API_TYPE) or die($^E); return decode_LPCWSTR($GetCommandLine->Call()); } # Returns a reference to an array on success. Returns undef and sets $ +^E on error. sub CommandLineToArgv { my ($cmd_line) = @_; state $CommandLineToArgv = Win32::API->new('shell32', 'CommandLineT +oArgvW', 'PP', PTR_WIN32API_TYPE) or die($^E); my $cmd_line_encoded = encode('UTF-16le', $cmd_line."\0"); my $num_args_buf = pack('i', 0); # Allocate space for an "int". my $arg_ptrs_ptr = $CommandLineToArgv->Call($cmd_line_encoded, $num +_args_buf) or return undef; my $num_args = unpack('i', $num_args_buf); my @args = map { decode_LPCWSTR($_) } unpack PTR_PACK_FORMAT.'*', ReadMemory($arg_ptrs_ptr, PTR_SIZE * $num_args); LocalFree($arg_ptrs_ptr); return \@args; } { my $cmd_line = GetCommandLine(); say $cmd_line; my $args = CommandLineToArgv($cmd_line) or die("CommandLineToArgv: $^E\n"); for my $arg (@$args) { say "<$arg>"; } }
|
---|