Thanks for the response. Given that this string that I am retrieving is actually the contents of a binary file then I should be OK to ignore anything to do with UTF8, given that my source code has no eight-bit or more characters.
Working from that I removed every reference to UTF8 subroutines from my code but I still get this wide character complaint when I try and write the string contents out to a binary (or any) file. So I have removed one potential issue (UTF8) but it's still got a problem.
While I take you at your word that this is not a UTF8 problem (as I understand it) It's odd that running encode('UTF-8'... against the string and writing the results out does not generate this wide character warning.
| [reply] [d/l] |
Given that this string that I am retrieving is actually the contents of a binary file then I should be OK to ignore anything to do with UTF8, given that my source code has no eight-bit or more characters.
It depends on how the data is handed to you. Note how below, both byte sequences are \304\243, but they're getting different interpretations based on Perl's internal UTF8 flag. If the module is handing you binary data with some encoding/decoding issues or perhaps the UTF8 flag incorrectly enabled, you'll have these kinds of strange issues that may explain the presence of U+FFFD REPLACEMENT CHARACTER in your original hex dump. Could you show your data with Devel::Peek?
$ perl -CSD -MDevel::Peek -le 'my $x="\x{123}"; print $x; Dump($x)'
ģ
SV = PV(0x1337d70) at 0x1357518
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x1359790 "\304\243"\0 [UTF8 "\x{123}"]
CUR = 2
LEN = 10
COW_REFCNT = 1
$ perl -CSD -MDevel::Peek -le 'my $x="\304\243"; print $x; Dump($x)'
ģ
SV = PV(0x1e28d70) at 0x1e48518
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x1e4a790 "\304\243"\0
CUR = 2
LEN = 10
COW_REFCNT = 1
| [reply] [d/l] [select] |