SBECK has asked for the wisdom of the Perl Monks concerning the following question:
In one of my modules (Date::Manip) I store a bunch of UTF8 data in a YAML file which I then load into a perl data structure. The basic form looks like this:
#!/usr/bin/perl use strict; use warnings; use YAML::Syck; my @in = <DATA>; my $in = join("",@in); my $dat = Load($in); 1; __DATA__ --- x : ă
Note: the ă was entered in the question as the UTF8 character ă but inside the code block, it's displayed as above. There's probably some markup I could use to get it to display properly, but I didn't want to spend too much time getting sidetracked from the problem, so just pretend that ă and ă are the same.
YAML::Syck has one property that I haven't found in any of the other YAML (or JSON) modules... it doesn't do any handling of UTF8 (converting to perl encoding). What you put in is what you get out, so if you run the above script in the debugger and dump the value of $dat, you get:
DB<1> p Dumper $dat $VAR1 = { 'x' => 'ă' };
Unfortunately, YAML::Syck is perhaps the least supported of the YAML modules and I'd like to switch to one of the more recent modules. If I change the above script to use YAML or YAML::XS (my preferred module), and then run it in the debugger, I get:
DB<1> p Dumper $dat $VAR1 = { 'x' => "\x{103}" };
i.e. It displays the string as a perl encoding rather than a UTF8 encoding. I'm completely open to the option of converting the YAML to JSON, but the JSON and JSON::XS modules do the same thing. I've tried the following script with similar results:
#!/usr/bin/perl use strict; use warnings; use JSON::XS; my @in = <DATA>; my $in = join("",@in); my $dat = JSON::XS->new->decode($in); my $dat2 = JSON::XS->new->utf8(0)->decode($in); my $dat3 = JSON::XS->new->utf8(1)->decode($in); 1; __DATA__ { "x" : "ă" }
Obviously, once the data structure is produced, I could recurse through it and change the perl encodings back to UTF8, but rather than do that, I'll probably just stick with YAML::Syck.
Any suggestions, or do I just stick to YAML::Syck?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: UTF8 with YAML or JSON
by zentara (Cardinal) on Jun 30, 2012 at 09:13 UTC | |
|
Re: UTF8 with YAML or JSON
by tobyink (Canon) on Jun 30, 2012 at 06:48 UTC | |
|
Re: UTF8 with YAML or JSON
by zwon (Abbot) on Jun 29, 2012 at 17:29 UTC | |
by SBECK (Chaplain) on Jun 29, 2012 at 17:36 UTC | |
by zwon (Abbot) on Jun 29, 2012 at 17:59 UTC | |
by SBECK (Chaplain) on Jun 29, 2012 at 18:06 UTC | |
by zwon (Abbot) on Jun 30, 2012 at 04:35 UTC | |
| |
|
Re: UTF8 with YAML or JSON
by Your Mother (Archbishop) on Jul 02, 2012 at 13:47 UTC | |
|
Re: UTF8 with YAML or JSON
by linuxkid (Sexton) on Jun 29, 2012 at 17:45 UTC | |
by SBECK (Chaplain) on Jun 29, 2012 at 18:10 UTC |