in reply to Re^2: Rogue character(s) at start of JSON file (BOM; dumping references)
in thread Rogue character(s) at start of JSON file

something is otherwise wrong

plz use Devel::Peek to find out if it's properly encoded and show us the result here.

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

  • Comment on Re^3: Rogue character(s) at start of JSON file (BOM; dumping references)

Replies are listed 'Best First'.
Re^4: Rogue character(s) at start of JSON file (BOM; dumping references)
by Bod (Parson) on Jan 19, 2023 at 17:29 UTC
    plz use Devel::Peek to find out if it's properly encoded and show us the result here

    I've not come across Devel::Peek before, let alone used it - so please bear with me if this is not right...

    #!/usr/bin/perl use CGI::Carp qw(fatalsToBrowser); use strict; use warnings; use Site::Utils; use JSON; use Devel::Peek; print "Content-type: text/plain\n\n"; open my $fh, '<', '../data/publicextract.charity.json' or die "Unable +to read Charity JSON File"; my $data = <$fh>; print "$data\n\n"; open STDERR, ">", 'output.txt' or die $!; print STDERR "Before\n"; Dump ($data); $data =~ s/^\x{feff}//; # Strip off BOM print STDERR "\n\nAfter\n"; Dump ($data); exit;

    This gives this output...

    Before SV = PV(0x1569cf0) at 0x15877a0 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x1c56920 "\357\273\277[{\"date_of_extract\":\"2023-01-16T00:00 +:00\",\"organisation_number\":1,\"registered_charity_number\":200027, +\"linked_charity_number\":1,\"charity_name\":\"POTTERNE MISSION ROOM +AND TRUST\",\"charity_type\":null,\"charity_registration_status\":\"R +emoved\",\"date_of_registration\":\"1962-05-17T00:00:00\",\"date_of_r +emoval\":\"2014-04-16T00:00:00\",\"charity_reporting_status\":null,\" +latest_acc_fin_period_start_date\":null,\"latest_acc_fin_period_end_d +ate\":null,\"latest_income\":null,\"latest_expenditure\":null,\"chari +ty_contact_address1\":null,\"charity_contact_address2\":null,\"charit +y_contact_address3\":null,\"charity_contact_address4\":null,\"charity +_contact_address5\":null,\"charity_contact_postcode\":null,\"charity_ +contact_phone\":null,\"charity_contact_email\":null,\"charity_contact +_web\":null,\"charity_company_registration_number\":null,\"charity_in +solvent\":false,\"charity_in_administration\":false,\"charity_previou +sly_excepted\":null,\"charity_is_cdf_or_cif\":null,\"charity_is_cio\" +:null,\"cio_is_dissolved\":null,\"date_cio_dissolution_notice\":null, +\"charity_activities\":null,\"charity_gift_aid\":null,\"charity_has_l +and\":null}\r\n"\0 CUR = 1082 LEN = 1122 After SV = PV(0x1569cf0) at 0x15877a0 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x1c56920 "\357\273\277[{\"date_of_extract\":\"2023-01-16T00:00 +:00\",\"organisation_number\":1,\"registered_charity_number\":200027, +\"linked_charity_number\":1,\"charity_name\":\"POTTERNE MISSION ROOM +AND TRUST\",\"charity_type\":null,\"charity_registration_status\":\"R +emoved\",\"date_of_registration\":\"1962-05-17T00:00:00\",\"date_of_r +emoval\":\"2014-04-16T00:00:00\",\"charity_reporting_status\":null,\" +latest_acc_fin_period_start_date\":null,\"latest_acc_fin_period_end_d +ate\":null,\"latest_income\":null,\"latest_expenditure\":null,\"chari +ty_contact_address1\":null,\"charity_contact_address2\":null,\"charit +y_contact_address3\":null,\"charity_contact_address4\":null,\"charity +_contact_address5\":null,\"charity_contact_postcode\":null,\"charity_ +contact_phone\":null,\"charity_contact_email\":null,\"charity_contact +_web\":null,\"charity_company_registration_number\":null,\"charity_in +solvent\":false,\"charity_in_administration\":false,\"charity_previou +sly_excepted\":null,\"charity_is_cdf_or_cif\":null,\"charity_is_cio\" +:null,\"cio_is_dissolved\":null,\"date_cio_dissolution_notice\":null, +\"charity_activities\":null,\"charity_gift_aid\":null,\"charity_has_l +and\":null}\r\n"\0 CUR = 1082 LEN = 1122

    Does that help?

    UPDATE:

    I've realised that because I am reading just the first line of the JSON file, it is malformed as it doesn't have the training ']' character. However, I have added $data .= ']'; to manually add it back on. This still doesn't solve the BOM issue at the end of the file but it might complicate testing...

      > Does that help?

      It doesn't seem to be properly utf8 encoded and as you already said the JSON is malformed because you didn't read it completely.

      use v5.12; use warnings; use Devel::Peek; use utf8; my $str = "\x{FEFF}['what','ever']"; $str =~ s/^\x{feff}//; Dump($str);

      SV = PV(0xe9cea8) at 0x24f1008 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x2565b28 "['what','ever']"\0 [UTF8 "['what','ever']"] CUR = 15 LEN = 24

      so you should take care to read the data properly.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

        But 0x1c56920 "\357\273\277 at the start of the file is not removed with $str =~ s/^\x{feff}//;

        ...and...I cannot be sure that 0x1c56920 "\357\273\277 will be at the start of all the JSON files - or is it safe to assume that? I suspect not!