in reply to Convert \u characters into utf8

You don't show us any code we could use to replicate your problem.

Maybe $file does not contain the file contents?

I recommend saving files as UTF8 and reading them as raw bytes. JSON modules expect either raw bytes (JSON::XS) or can be configured to accept Latin-1 (JSON).

Replies are listed 'Best First'.
Re^2: Convert \u characters into utf8
by ultranerds (Hermit) on Feb 02, 2016 at 13:14 UTC
    Hi,

    Sorry - the code is just being grabbed using wget:

    `wget -O/srv/www/site.net/www/cgi-bin/admin/tmp/in.txt 'https://openapi.etsy.com/v2/shops/Syrestria/listings/active?method=GET&api_key=xxxxx&limit=200&includes=MainImage'`;

    ..and a basic script I wrote, does:

    #!/usr/bin/perl use File::Slurp; use Encode; my $file = read_file("./in.txt"); $file =~ s/\\u(....)/chr hex $1/ge; print "$file\n";


    However, as I explained that does not work well :) (some get encoded, but the vast majority do not)

    Are you suggesting I do something like this?

    use File::Slurp; use Encode; use JSON; use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset); my $file = read_file("./in.txt"); my $json_var = decode_json($file); foreach (@{$json_var->{results}}) { $_->{description} =~ s/([\200-\377]+)/from_utf8({ -string => $1, - +charset => 'ISO-8859-1'})/eg; print "BLA - $_->{description} \n"; }


    Cheers

    Andy

      No. I'm suggesting that you use a JSON module for loading JSON data. There should be no need at least with the two JSON modules I mentioned to manually convert \uXXXX to their Unicode equivalents.

      use JSON; use Data::Dumper; $Data::Dumper::Useqq = 1; my $data = decode_json( $file_content ); warn Dumper $data;

      Note that File::Slurp is horribly broken regarding encodings. Some comments recommend File::Slurper, but I instead roll my own, which isn't rocket surgery either.

        Ahhh beautiful! I didn't realise they did the job for you. So I have this now:

        my $as_str; open (READIT,"./in.txt"); $as_str = <READIT>; close(READIT); my $json_var = decode_json($as_str); open (OUT, ">./foo.txt") || die $!; print OUT encode_json($json_var); close (OUT);


        That seems to have done the trick :)

        Just out of interest - in NotePad++, if I look at the encoding, I see it as:

        Encode in UTF-8 without BOM

        I'm a bit confused as to how it has saved it like that, when I didn't specifically tell it to save in UTF8 format?

        Cheers

        Andy