in reply to Re^2: JSON, Data::Dumper and accented chars in utf-8
in thread JSON, Data::Dumper and accented chars in utf-8

Oh, I do highly disapprove of the site mangling my Unicode character ő inside a code block.

  • Comment on Re^3: JSON, Data::Dumper and accented chars in utf-8 [OFF/Gripe]

Replies are listed 'Best First'.
Re^4: JSON, Data::Dumper and accented chars in utf-8 [OFF/Gripe]
by silentius (Monk) on Jan 21, 2012 at 22:42 UTC

    Thank you both for your replies, although they did not solve my problem.

    I kept searching and solved it simply like this:

    use Encode; use Encode::Escape; use Data::Dumper; use JSON; ... while ($line = <IN>) { $strut = from_json($line); print decode('unicode-escape', Dumper($strut)) . "\n"; }

    This now gives me the output I need, which is the accented chars displayed as they are, since the output is to be redirected to a text file and read by humans on a regular text editor.

    Thank you all once again.

      And why isn’t the JSON data enough? JSON has a pretty option, which would pretty-print the data for you, not requiring you to (ab)use Dumper.

      Keep in mind, that any JSON data generated by JSON is a byte stream and not a (Unicode) string.

      • To print it in a terminal you should decode it into a string and tell Perl your terminal is UTF-8, as suggested above. Sending byte streams to terminals is a bad thing, definitely don’t do that.
      • To print it into a file,
        • you might be best off with a file that you opened as a binary (open my $FH, '>:raw', "myfilename.txt"). Any byte thrown at a raw file will appear there as intended.
        • Or of course, you can decode it into a string, print it into the file, Perl will know that it’s a string and either figure out the byte stream format (aka. encoding) for the file, or you should be specifying one, eg. '>:encoding(UTF-8)'. Letting Perl do it for you is not necessarily a good idea :)

      Yes, fully aware it looks extremely complicated :)

Re^4: JSON, Data::Dumper and accented chars in utf-8 [OFF/Gripe]
by ikegami (Patriarch) on Jan 22, 2012 at 06:24 UTC
    It's not the site that did that; it's your browser. "ő" doesn't exist in Windows-1252, so your browser decided to send "&#337;" instead. PerlMonks is displaying "<code>&#337;</code>" as "&#337;" as it should.

      Ah, cp1252, how retro!

      Still, in the end, the process mangles perfectly viable characters that are by no means special to node syntax, and that’s just bad.