Re^3: Encode double encoding?

I suppose the question is: why can't I use decode_utf8() on output from XML::LibXML?

Because https://metacpan.org/pod/XML::LibXML#ENCODINGS-SUPPORT-IN-XML::LIBXML already gives you "characters" instead of bytes/octets/raw

... yields ...

See replies by McA and farang , thanks monks :)

Also, terminals lie, even browsers lie, the bytes do not lie ... see comments in this code (and the code, and keep in mind the 5min tutorial)

#!/usr/bin/perl --
use strict;
use warnings;
use Data::Dump qw/ dd /;
use Encode qw/ encode decode /;
use Path::Tiny qw/ path /;

my $tmpfile = path( 'deleteme.txt' );

## CHARACTERS HERE
#~ ordinal= ord( chr( 115 ) ) alias \N{U+0073} alias \163 alias LATIN 
+SMALL LETTER S alias s
#~ ordinal= ord( chr( 195 ) ) alias \N{U+00C3} alias \303 alias LATIN 
+CAPITAL LETTER A TILDE alias Ă
#~ ordinal= ord( chr( 188 ) ) alias \N{U+00BC} alias \274 alias FRACTI
+ON ONE QUARTER alias Ľ
#~ ordinal= ord( chr( 195 ) ) alias \N{U+00C3} alias \303 alias LATIN 
+CAPITAL LETTER A TILDE alias Ă
#~ ordinal= ord( chr( 159 ) ) alias \N{U+009F} alias \237 alias APPLIC
+ATION PROGRAM COMMAND alias &#159;
#~ ordinal= ord( chr( 101 ) ) alias \N{U+0065} alias \145 alias LATIN 
+SMALL LETTER E alias e
my $ords = join q{}, map { chr $_ } ( 115, 195, 188, 195, 159, 101 );

$tmpfile->spew_raw( $ords );
dd(
    {
        ords => $ords,
        raw  => $tmpfile->slurp_raw,
        utf8 => $tmpfile->slurp_utf8
    }
);

#~ {
#~   ords => "s\xC3\xBC\xC3\x9Fe",
#~   raw  => "s\xC3\xBC\xC3\x9Fe",
#~   utf8 => "s\xFC\xDFe",
#~ }
## when you write raw without encoding
## when read that stuff as utf8, you get a surprise
#~ ordinal= ord( chr( 223 ) ) alias \N{U+00DF} alias \337 alias LATIN 
+SMALL LETTER SHARP S alias ß
#~ ordinal= ord( chr( 252 ) ) alias \N{U+00FC} alias \374 alias LATIN 
+SMALL LETTER U DIAERESIS alias ü

## >>>> OUTPUT encoded, the raw bytes change
$tmpfile->spew_utf8( $ords );
dd(
    {
        ords => $ords,
        raw  => $tmpfile->slurp_raw,
        utf8 => $tmpfile->slurp_utf8
    }
);

#~ {
#~   ords => "s\xC3\xBC\xC3\x9Fe",
#~   raw  => "s\xC3\x83\xC2\xBC\xC3\x83\xC2\x9Fe",
#~   utf8 => "s\xC3\xBC\xC3\x9Fe",
#~ }
## utf8 is an encoding, representing characters (ordinals)

$tmpfile->spew_raw( encode 'UTF-8', $ords );
dd(
    {
        ords => $ords,
        raw  => $tmpfile->slurp_raw,
        utf8 => $tmpfile->slurp_utf8
    }
);

#~ {
#~   ords => "s\xC3\xBC\xC3\x9Fe",
#~   raw  => "s\xC3\x83\xC2\xBC\xC3\x83\xC2\x9Fe",
#~   utf8 => "s\xC3\xBC\xC3\x9Fe",
#~ }

## decode raw bytes     to get characters
## encode characters    to get raw bytes/octets
dd(
    {
        ords            => $ords,
        decode_utf8_raw => decode( 'UTF-8', $tmpfile->slurp_raw ),
        utf8            => $tmpfile->slurp_utf8,
    }
);

#~ {
#~   decode_utf8_raw => "s\xC3\xBC\xC3\x9Fe",
#~   ords  => "s\xC3\xBC\xC3\x9Fe",
#~   utf8  => "s\xC3\xBC\xC3\x9Fe",
#~ }
## hooray

$tmpfile->remove;

__END__
[download]

_Is_ any of this covered in perlunitut, and if so, under what section?

You can start with I/O flow (the actual 5 minute tutorial)

You should read the whole thing and the links it links

Also download tarball from Perl Unicode Essentials: OSCON 2011 - O'Reilly Conferences, July 25 - 29, 2011, Portland, OR for even more unicode info

Comment on Re^3: Encode double encoding? Download Code