8.1. Character Encoding JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629]. Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON- based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability. Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error. #### 2. JSON Grammar ... #### #!/usr/bin/env perl use 5.010; use strict; use warnings; my @json_tests = ( '', 'crap', '[]', '{}', " []", "\t[]", "\x{feff}[]", qq<\x{feff}\t{"k":"v"}>, ); for my $test (@json_tests) { _json_chars($test); my $clean_json = clean_json($test); _json_chars($clean_json); say '-' x 40; } sub clean_json { my ($json) = @_; return '' unless length $json; state $re = qr{(?x: ^ ( (?: \x{feff}| ) ) ( [\x{20}\x{09}\x{0a}\x{0d}]* (?: false|null|true|\[|\{|" ) .* ) )}; if ($json =~ $re) { my ($bom, $text) = ($1, $2); if ($bom eq '') { say "JSON good as is."; } else { $json = $text; say "JSON cleaned -- BOM removed."; } } else { say 'Invalid JSON! Nothing cleaned.'; } return $json; } sub _json_chars { my ($json) = @_; if (! length $json) { say 'Zero-length JSON'; } else { say 'JSON chars: ', join '-', map sprintf('%x', ord), split //, $json; } return; } #### Zero-length JSON Zero-length JSON ---------------------------------------- JSON chars: 63-72-61-70 Invalid JSON! Nothing cleaned. JSON chars: 63-72-61-70 ---------------------------------------- JSON chars: 5b-5d JSON good as is. JSON chars: 5b-5d ---------------------------------------- JSON chars: 7b-7d JSON good as is. JSON chars: 7b-7d ---------------------------------------- JSON chars: 20-20-5b-5d JSON good as is. JSON chars: 20-20-5b-5d ---------------------------------------- JSON chars: 9-5b-5d JSON good as is. JSON chars: 9-5b-5d ---------------------------------------- JSON chars: feff-5b-5d JSON cleaned -- BOM removed. JSON chars: 5b-5d ---------------------------------------- JSON chars: feff-9-7b-22-6b-22-3a-22-76-22-7d JSON cleaned -- BOM removed. JSON chars: 9-7b-22-6b-22-3a-22-76-22-7d ----------------------------------------