Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Win32 encoding conversion mystery

by ikegami (Patriarch)
on Aug 13, 2018 at 11:48 UTC ( [id://1220290]=note: print w/replies, xml ) Need Help??


in reply to Win32 encoding conversion mystery

The Windows API has two versions of each function that requires text strings: An "(A)NSI" version and a "(W)ide" version.

The ANSI version expects strings encoded using the Active Code Page. The ACP can be obtained from Win32::ACP(), and am encoding name suitable for Encode::encode and Encode::decode can be obtained from "cp".Win32::ACP(). For Western machines, these are 1252 and cp1252 respectively. (Not latin1 as you said.) This is what Perl's native functions use.

To manipulate any file, you need to use the Wide version. These are made accessible by Win32::Unicode and other modules from the same distro.

Replies are listed 'Best First'.
Re^2: Win32 encoding conversion mystery
by mithaldu (Monk) on Aug 13, 2018 at 16:26 UTC
    Based on your recommendation i wrote this:
    use Win32::Unicode; my $wdir = Win32::Unicode::Dir->new; $wdir->open(@ARGV); my ( undef, undef, $file ) = $wdir->fetch; print join "\n", "length: " . length( $file ) . ", code points:", map +ord, split //, $file;
    Which gives the below result, which is identical with the weirdly "upconverted" unicode above, and without further processing useless, as it doesn't map to anything but mojibake on any codepage known to me. (And ACP gives me 1252, sorry for speaking inaccurately about that.)
    d:\>perl filename_check.pl RJ209072 length: 17, code points: 226 123 226 78 233 9524 196 113 233 9568 196 191 233 8976 233 189 50
      What is it supposed to give you?
        I need the string as it is represented on the right side in the OP post, as those are the actual cp932 bytes, which i can convert to real UTF8. I could get these by using windows `dir`, so they are in the file system data, but dir's applications are limited. At the same time all the windows api calls seem to give me are the "mangled for visualizing" utf8 mojibake.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1220290]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-26 04:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found