8.1. Character Encoding
JSON text exchanged between systems that are not part of a closed
ecosystem MUST be encoded using UTF-8 [RFC3629].
Previous specifications of JSON have not required the use of UTF-8
when transmitting JSON text. However, the vast majority of JSON-
based software implementations have chosen to use the UTF-8 encoding,
to the extent that it is the only encoding that achieves
interoperability.
Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a networked-transmitted JSON text. In the interests of
interoperability, implementations that parse JSON texts MAY ignore
the presence of a byte order mark rather than treating it as an
error.
####
2. JSON Grammar
...
####
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
my @json_tests = (
'',
'crap',
'[]',
'{}',
" []",
"\t[]",
"\x{feff}[]",
qq<\x{feff}\t{"k":"v"}>,
);
for my $test (@json_tests) {
_json_chars($test);
my $clean_json = clean_json($test);
_json_chars($clean_json);
say '-' x 40;
}
sub clean_json {
my ($json) = @_;
return '' unless length $json;
state $re = qr{(?x:
^
(
(?: \x{feff}| )
)
(
[\x{20}\x{09}\x{0a}\x{0d}]*
(?: false|null|true|\[|\{|" )
.*
)
)};
if ($json =~ $re) {
my ($bom, $text) = ($1, $2);
if ($bom eq '') {
say "JSON good as is.";
}
else {
$json = $text;
say "JSON cleaned -- BOM removed.";
}
}
else {
say 'Invalid JSON! Nothing cleaned.';
}
return $json;
}
sub _json_chars {
my ($json) = @_;
if (! length $json) {
say 'Zero-length JSON';
}
else {
say 'JSON chars: ',
join '-', map sprintf('%x', ord), split //, $json;
}
return;
}
####
Zero-length JSON
Zero-length JSON
----------------------------------------
JSON chars: 63-72-61-70
Invalid JSON! Nothing cleaned.
JSON chars: 63-72-61-70
----------------------------------------
JSON chars: 5b-5d
JSON good as is.
JSON chars: 5b-5d
----------------------------------------
JSON chars: 7b-7d
JSON good as is.
JSON chars: 7b-7d
----------------------------------------
JSON chars: 20-20-5b-5d
JSON good as is.
JSON chars: 20-20-5b-5d
----------------------------------------
JSON chars: 9-5b-5d
JSON good as is.
JSON chars: 9-5b-5d
----------------------------------------
JSON chars: feff-5b-5d
JSON cleaned -- BOM removed.
JSON chars: 5b-5d
----------------------------------------
JSON chars: feff-9-7b-22-6b-22-3a-22-76-22-7d
JSON cleaned -- BOM removed.
JSON chars: 9-7b-22-6b-22-3a-22-76-22-7d
----------------------------------------