I'm struggling with something that I thought would be very simple. I have a legacy system which sends data in JSON. The underlying data, which I can't change, uses HTML entities. I need to convert this to UTF8, because a receiving system can't handle the entities. I wrote a one-line test for this, which is failing, and I don't know why.
When I do the conversion on the text itself, it looks fine. When I do the conversion on the JSON, it also looks fine, but when I decode the JSON for the test, it seems to re-convert the UTF8 JSON elements into something wrong. A simple test case:
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::Entities;
use Encode;
use JSON::MaybeXS;
my $original_string = "Eötvös Loránd University";
my $converted_string = encode_utf8( decode_entities($original_string)
+);
print "Original string: [$original_string]\n"; # shows the entities
print "Converted string: [$converted_string]\n"; # shows the special c
+haracters
my $entities_json = '{"school":"Eötvös Loránd Uni
+versity"}';
my $converted_json = encode_utf8(decode_entities($entities_json));
print "Original JSON: [$entities_json]\n"; # shows the entities
print "Converted JSON: [$converted_json]\n"; # looks right: shows the
+special characters
my $decoded_json = decode_json($converted_json);
print "School: " . $decoded_json->{'school'} . "\n"; # should be "Eötv
+ös Loránd University" but is actually "�tv�s Lor�
+;nd University", with the special characters messed up (N.B. Perlmonk
+s is showing this incorrectly as well)
What is going on here? And, how am I supposed to convert my JSON-with-entities to something, well, correct?