in reply to Re: RT::Client turns occasional binary characters in to wide characters
in thread RT::Client turns occasional binary characters in to wide characters

Thanks for that. According to is_utf8() the string is in UTF8, however running encode_utf8() doesn't resolve the problem. it *does* remove then 4 character hex, but doesn't put the code back to what it was originally:
encode_utf8() version: 00000000 50 4B 03 04 14 00 09 00 08 00 67 EF B +F BD 25 46 PK........g...%F Original version: 0000000 50 4b 03 04 14 00 09 00 08 00 67 8d 2 +5 46 00 00
I took a look at the attributes of the file, as @Veltro suggested and got the following:
content_type is: application/octet-stream content_encoding is: none file_name is: screenshot-172 21 242 64.zip headers is: Content-Type: application/octet-stream; name="screenshot-1 +72 21 242 64.zip" Content-Disposition: attachment; filename="screenshot-172 21 242 64.zi +p" Content-Transfer-Encoding: base64 Content-Length: 460749

That "base64" string in the headers section looked interesting although the string does not seem to be encoded insofar as is has characters in it that do not match the Base64 character set (A-Za-z0-9+/=).

I tried encoding and decoding using the MIME functions but to no avail.

The content length stated is the exact size of the actual binary file (460749 bytes) but the string provided by the RT libraries is different (442958 bytes). I would be willing to believe that the missing 17791 characters are included in the wide characters in the RT string, that is to say that I expect there to be 17791 wide characters in the octet stream.

Replies are listed 'Best First'.
Re^3: RT::Client turns occasional binary characters in to wide characters
by Anonymous Monk on Oct 03, 2018 at 22:14 UTC
    This is another reason why is_utf8 is a trap. It does not indicate the string is "in UTF-8". It is an internal flag that describes how Perl is internally storing the string. utf8::upgrade and utf8::downgrade enable and disable this flag respectively without any change to the string (as used in Perl code) (as long as the string can be represented in your native encoding, otherwise utf8::downgrade will croak). So in fact, the only sure thing you can determine from is_utf8 is that every Perl string with codepoints above U+FF *must* have it enabled (but not the other way around).