=head2 data utf8_decode (data) Recursively UTF8-decode a data structure. Any data structure. Decoding turns on the UTF-8 flag in Perl and makes Perl treat the data as string (so methods like C are accurate). If you want to print the data over the network, you need to B it into bytes. =cut sub utf8_decode { my $data = shift; my $encode = shift || 0; # If it's a data structure (hash or array), recurse over its contents. if (ref($data) eq "HASH") { foreach my $key (keys %{$data}) { # Another data structure? if (ref($data->{$key})) { # Recurse. $data->{$key} = utf8_decode($data->{$key}, $encode); next; } # Encode the scalar. $data->{$key} = utf8_decode($data->{$key}, $encode); } } elsif (ref($data) eq "ARRAY") { foreach my $key (@{$data}) { # Another data structure? if (ref($key)) { # Recurse. $key = utf8_decode($key, $encode); next; } # Encode the scalar. $key = utf8_decode($key, $encode); } } else { # This is a leaf node in our data (a scalar). Encode UTF-8! my $is_utf8 = utf8::is_utf8($data); # Are they *encoding* (turning into bytes) instead of *decoding*? if ($encode) { # Encoding (making bytestream): only decode IF it is currently UTF8. return $data unless $is_utf8; $data = Encode::encode("UTF-8", $data); } else { # Decoding. If it's ALREADY UTF-8, do not decode it again. return $data if $is_utf8; $data = Encode::decode("UTF-8", $data); } } return $data; } =head2 data utf8_encode (data) Recursively UTF8 encode a data structure. B means turning the data into a byte stream (so string operators like C will be inaccurate). Encoding is necessary to transmit a Unicode string over a network. =cut sub utf8_encode { my $data = shift; return utf8_decode($data, 1); }