in reply to Re: deal with incorrectly set utf8 flag
in thread deal with incorrectly set utf8 flag

It appears that utf8::downgrade alters the bytes in the scalar. I'm exactly seeking to avoid this. I want to use the bytes in the scalar without Perl's utf8-handling kicking in.

The library in question read binary (non-character) data from a database field. It had no business marking the data as utf8, but it did so anyway. Now I'm looking to use the binary data without any conversions or warnings.

It looks like I can use Encode::_utf8_off, but this is documented as an internal function that shouldn't be relied on. It looks like "use bytes" works, but I don't know if this is the way it should be done. I am looking to find the way.

  • Comment on Re^2: deal with incorrectly set utf8 flag

Replies are listed 'Best First'.
Re^3: deal with incorrectly set utf8 flag
by ikegami (Patriarch) on Mar 27, 2009 at 16:40 UTC

    this is documented as an internal function that shouldn't be relied on.

    It means you should normally use utf8::encode or Encode::encode 'UTF-8'.

    $ perl -MDevel::Peek -MEncode=_utf8_on,_utf8_off,encode -e' _utf8_on( $x = "\342\231\240" ); utf8::encode( my $utf8 = $x ); my $enc = encode("UTF-8", $x); _utf8_off( my $off = $x ); Dump $x; Dump $utf8; Dump $enc; Dump $off; ' PV = 0x8165280 "\342\231\240"\0 [UTF8 "\x{2660}"] PV = 0x81623f0 "\342\231\240"\0 PV = 0x81920e8 "\342\231\240"\0 PV = 0x81ff040 "\342\231\240"\0

    But if _utf8_on or equivalent was wrongly used, _utf8_off is appropriate.

      Thank you for your wisdom.
        Actually, looks like utf8::encode is just as dumb as _utf8_off.
        $ perl -MDevel::Peek -MEncode=_utf8_on,_utf8_off,encode -e' _utf8_on( my $x = "\200\201" ); # Invalid UTF-8 utf8::encode( my $utf8 = $x ); my $enc = encode("UTF-8", $x); _utf8_off( my $off = $x ); Dump $x; Dump $utf8; Dump $enc; Dump $off; ' PV = 0x8165280 "\200\201"\0 [UTF8 "\x{1}@"] PV = 0x81623f0 "\200\201"\0 PV = 0x81920f8 "\357\277\275\357\277\275"\0 PV = 0x81ff040 "\200\201"\0

        Except when the flag isn't on.

        $ perl -MDevel::Peek -MEncode=_utf8_on,_utf8_off,encode -e' my $x = "\200\201"; utf8::encode( my $utf8 = $x ); my $enc = encode("UTF-8", $x); _utf8_off( my $off = $x ); Dump $x; Dump $utf8; Dump $enc; Dump $off; ' PV = 0x8165278 "\200\201"\0 PV = 0x816b608 "\302\200\302\201"\0 PV = 0x8195050 "\302\200\302\201"\0 PV = 0x81ff038 "\200\201"\0

        Your pick.