Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I used Data::Dump::dd to dump a large data structure which included unicode strings, but all is not well as the following program (play with it) demonstrates

#!/usr/bin/perl -- use strict; use warnings; #~ use utf8; # no help #~ use feature 'unicode_strings'; # no help # turns off wide but CORRUPTS output #~ binmode STDOUT, ':encoding(UTF-8)'; binmode STDOUT; # wide character in print print qq!\x{FEFF}!; # print BOM print "Foo \xE2\x80\x94 Bar";

use utf8 is no help

use feature 'strings' is no help

using only binmode produces wide character in print

binmode UTF-8 corrupts the file

I do not want to iterate over my huge hash of hashes to Encode::decode('UTF-8')

I must be missing something, but what? How do I get unicode strings without manually calling decode, and print them, without lame warnings or corruption?

Replies are listed 'Best First'.
Re: unicode strings without decoding or warnings or corruption
by Corion (Patriarch) on May 13, 2012 at 10:17 UTC

    You can't mix binary octets (your "undecoded UTF-8 strings") and UTF-8 output on a filehandle.

    Either manually encode your wide characters to UTF-8, and output them as octets, or decode your UTF-8 octets to unicode strings and output them.

    As a last resort, maybe switch off warnings.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: unicode strings without decoding or warnings or corruption
by Anonymous Monk on May 13, 2012 at 10:45 UTC

    Answering myself again

    sub fixUTFness { use Data::Visitor::Callback; my $decodeVisitor = Data::Visitor::Callback->new( ignore_return_values => 1, value => sub { utf8::decode($_); return }, ); $decodeVisitor->visit( @_ ); }

    then you get stuff like  print "Foo \x{2014} Bar"; and it all works

      How is your code a solution, given the stated (and weird) restriction of

      I do not want to iterate over my huge hash of hashes to Encode::decode('UTF-8')

      While I don't doubt its correctness, if you are really the same person as the original poster, can you maybe explain how your approach of using Data::Visitor is not iterating over your hash?

        How is your code a solution, given the stated (and weird) restriction of

        I looked through my stuff and found that snippet to iterate over an arbitrary structure -- I thought I would have to write something from scratch