Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: Win32::OLE with non-ANSI data

by freonpsandoz (Beadle)
on Mar 29, 2022 at 21:35 UTC ( #11142514=note: print w/replies, xml ) Need Help??

in reply to Win32::OLE with non-ANSI data

It gets weirder. The CP_UTF8 option seems to work for data input to the OLE object, but not for returned data. It appears that data is returned in CP_ACP if possible, and only returned as a Perl string if conversion to CP_ACP fails. Nothing seems to be returned to indicate to the caller how the returned data is encoded. Please check whether I'm missing something or whether this is a bug. In test1.mp3, the 'artist' tag is "The Crüxshadows" and in test2.mp3 it's the Cyrillic text "Издатель." Is there a way for me to supply the files I'm using for testing? Thanks.

use strict; use warnings; use Encode qw( is_utf8 ); use Win32::OLE (); Win32::OLE->Option ( CP => Win32::OLE::CP_UTF8 ); binmode( STDOUT, ':raw' ); my $filename = shift or die "No file specified\n"; my $dmcconverter = Win32::OLE->new('dMCScripting.Converter') or die "Can't create dMCScripting.Converter object: $!\n"; my $data = $dmcconverter->AudioProperties($filename); printf( STDERR "The UTF-8 flag for converter output is %d\n", is_utf8( +$data) // 0 ); print "$data"; d:\Mp3\Encode>perl -S D:\Mp3\Encode\test1.mp3 >test1.t +xt The UTF-8 flag for converter output is 0 d:\Mp3\Encode>perl -S D:\Mp3\Encode\test2.mp3 >test2.t +xt The UTF-8 flag for converter output is 1 Wide character in print at D:\Batch/ line 15.

UPDATE: I just realized that part of the weirdness is in how Perl represents strings internally. I had been led to believe that it was (almost) UTF-8, but that doesn't seem to be the case. If I read the string "The Crüxshadows" from a file in raw mode, the string data is UTF-8 octets, with the UTF-8 flag off. If I read the same data with an ":encoding(UTF-8)" layer specified, the string data is cp1252 octets with the UTF-8 flag on.

Replies are listed 'Best First'.
Re^2: Win32::OLE with non-ANSI data
by Anonymous Monk on Mar 30, 2022 at 06:18 UTC

      I see now that "by default, the internal format is either ISO-8859-1 (latin-1), or utf8, depending on the history of the string." I hadn't seen that before. I was under the impression that if the utf8 flag is not set, the string consists of octets that should be decoded. It now appears that this impression was incorrect. That brings me back to the question: Exactly what does the Win32:OLE documentation mean when it talks about the CP option for "translations between Perl strings and Unicode strings?" Does the CP_UTF8 option actually mean "character strings in Perl's internal format?" Thanks.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11142514]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2022-08-18 01:39 GMT
Find Nodes?
    Voting Booth?

    No recent polls found