in reply to Re: Convert \u characters into utf8
in thread Convert \u characters into utf8

Hi,

Sorry - the code is just being grabbed using wget:

`wget -O/srv/www/site.net/www/cgi-bin/admin/tmp/in.txt 'https://openapi.etsy.com/v2/shops/Syrestria/listings/active?method=GET&api_key=xxxxx&limit=200&includes=MainImage'`;

..and a basic script I wrote, does:

#!/usr/bin/perl use File::Slurp; use Encode; my $file = read_file("./in.txt"); $file =~ s/\\u(....)/chr hex $1/ge; print "$file\n";


However, as I explained that does not work well :) (some get encoded, but the vast majority do not)

Are you suggesting I do something like this?

use File::Slurp; use Encode; use JSON; use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset); my $file = read_file("./in.txt"); my $json_var = decode_json($file); foreach (@{$json_var->{results}}) { $_->{description} =~ s/([\200-\377]+)/from_utf8({ -string => $1, - +charset => 'ISO-8859-1'})/eg; print "BLA - $_->{description} \n"; }


Cheers

Andy

Replies are listed 'Best First'.
Re^3: Convert \u characters into utf8
by Corion (Patriarch) on Feb 02, 2016 at 13:23 UTC

    No. I'm suggesting that you use a JSON module for loading JSON data. There should be no need at least with the two JSON modules I mentioned to manually convert \uXXXX to their Unicode equivalents.

    use JSON; use Data::Dumper; $Data::Dumper::Useqq = 1; my $data = decode_json( $file_content ); warn Dumper $data;

    Note that File::Slurp is horribly broken regarding encodings. Some comments recommend File::Slurper, but I instead roll my own, which isn't rocket surgery either.

      Ahhh beautiful! I didn't realise they did the job for you. So I have this now:

      my $as_str; open (READIT,"./in.txt"); $as_str = <READIT>; close(READIT); my $json_var = decode_json($as_str); open (OUT, ">./foo.txt") || die $!; print OUT encode_json($json_var); close (OUT);


      That seems to have done the trick :)

      Just out of interest - in NotePad++, if I look at the encoding, I see it as:

      Encode in UTF-8 without BOM

      I'm a bit confused as to how it has saved it like that, when I didn't specifically tell it to save in UTF8 format?

      Cheers

      Andy