deal with incorrectly set utf8 flag

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: deal with incorrectly set utf8 flag by ikegami (Patriarch) on Mar 27, 2009 at 16:15 UTC
It's not clear what you want. `utf8::downgrade` switches from bytes internally encoded as UTF-8 to just bytes. `$ perl -MDevel::Peek -e'utf8::upgrade $x="\200\201"; Dump $x; utf8::do +wngrade $x; Dump $x' PV = 0x81623f0 [UTF8 "\x{80}\x{81}"] PV = 0x81623f0 "\200\201"\0` [download] `utf8::encode` will re-encode the data that has been decoded from UTF-8. `$ perl -MDevel::Peek -e'utf8::decode $x="\x{2660}"; Dump $x; utf8::enc +ode $x; Dump $x' PV = 0x81651c0 [UTF8 "\x{2660}"] PV = 0x81651c0 "\342\231\240"\0` [download] If it's truly an incorrectly set flag, there's also `Encode::_utf8_off`. It should only be used if the above two don't work. `$ perl -MDevel::Peek -MEncode=_utf8_on,_utf8_off -e'_utf8_on( $x="\200 +\201" ); Dump $x; _utf8_off $x; Dump $x' PV = 0x81651e8 [UTF8 "\x{1}@"] PV = 0x81651e8 "\200\201"\0` [download] References: utf8 (You don't need to load the module to use its subs.) Encode Update: Added code.	[reply] [d/l] [select]
Re^2: deal with incorrectly set utf8 flag by Anonymous Monk on Mar 27, 2009 at 16:37 UTC
It appears that utf8::downgrade alters the bytes in the scalar. I'm exactly seeking to avoid this. I want to use the bytes in the scalar without Perl's utf8-handling kicking in. The library in question read binary (non-character) data from a database field. It had no business marking the data as utf8, but it did so anyway. Now I'm looking to use the binary data without any conversions or warnings. It looks like I can use Encode::_utf8_off, but this is documented as an internal function that shouldn't be relied on. It looks like "use bytes" works, but I don't know if this is the way it should be done. I am looking to find the way.	[reply]
Re^3: deal with incorrectly set utf8 flag by ikegami (Patriarch) on Mar 27, 2009 at 16:40 UTC
this is documented as an internal function that shouldn't be relied on. It means you should normally use `utf8::encode` or `Encode::encode 'UTF-8'`. `$ perl -MDevel::Peek -MEncode=_utf8_on,_utf8_off,encode -e' _utf8_on( $x = "\342\231\240" ); utf8::encode( my $utf8 = $x ); my $enc = encode("UTF-8", $x); _utf8_off( my $off = $x ); Dump $x; Dump $utf8; Dump $enc; Dump $off; ' PV = 0x8165280 "\342\231\240"\0 [UTF8 "\x{2660}"] PV = 0x81623f0 "\342\231\240"\0 PV = 0x81920e8 "\342\231\240"\0 PV = 0x81ff040 "\342\231\240"\0` [download] But if `_utf8_on` or equivalent was wrongly used, `_utf8_off` is appropriate.	[reply] [d/l] [select]
Re^4: deal with incorrectly set utf8 flag by Anonymous Monk on Mar 27, 2009 at 16:48 UTC
Re^5: deal with incorrectly set utf8 flag by ikegami (Patriarch) on Mar 27, 2009 at 16:53 UTC