Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Release: JSON::SIMD

by kikuchiyo (Hermit)
on Apr 18, 2023 at 20:10 UTC ( [id://11151740]=perlnews: print w/replies, xml ) Need Help??

I hereby announce the release of a new JSON encoder/decoder module called JSON::SIMD.

It is a fork of JSON::XS where the decoder was replaced with simdjson, a recent high-performance C++ JSON parser that uses SIMD instructions found in newer CPUs to make decoding as fast as possible.

The module is intended to be a drop-in replacement for JSON::XS, indeed the encoder part is unmodified, and the legacy decoder is kept to support the functionality that could not be implemented with the simdjson decoder. But for the most common scenarios the simdjson decoder is the default, offering a speedup ranging from a few percents to 100+%, depending on the size and structure of the JSON document and the available instruction set. See the documentation for benchmarks.

The interface is mostly the same as that of JSON::XS, with one notable addition: there is a new method decode_at_pointer, which leverages simdjson's ability to quickly scan through the document to return just a part of the document, without needing to decode and allocate for the rest.

Example:

my $large_json = '{ "ignore": "this", "don't need": ["these", "either"], "foo": ["bar", {"baz": "quux"}] }'; JSON::SIMD->new->use_simdjson->decode_at_pointer($large_json, '/foo/1' +); => {bar => 'quux'}

I'd be interested in seeing benchmarks - if you create one, please include the output of simdjson_version as well.

FAQ (Facetiously Anticipated Questions):

Why a new module?

When I came across simdjson, I've looked at the section in their documentation that listed the available ports and bindings, and noticed that one particular language was conspicuously missing. I wanted to remedy that situation.

Why JSON::XS?

I didn't want to start with a clean slate, because I saw value in having the encoder and decoder in the same all-in-one package, and I didn't want to rewrite the encoder part from scratch, so it was logical to start from an established module. JSON::XS was chosen as the basis for the fork because I has been using that module at $work and elsewhere without problems, so preserving compatibility with it was seen as an important goal.

Replies are listed 'Best First'.
Re: Release: JSON::SIMD
by Tux (Canon) on Apr 19, 2023 at 08:12 UTC
    • YEAH!
    • Please get rid of junk like use common::sense; No need for it, and it'll be a reason for many to refuse this module. There is no need to be 100% compatible with JSON::XS :)
      Replacing use common::sense; with use strict; use warnings; and removing it from Makefile.PL will still build this new module.
    • There are still some test issues. See below

    First problems with-Duselongdouble.

    update: problems with -Dusequadmath:

    Looking forward to using it!


    Enjoy, Have FUN! H.Merijn

      Replacing use common::sense; with use strict; use warnings;

      I've thought about that. Personally, I don't use common::sense, because I happen to disagree with its warnings policy. However, during the development of this module I've tried to get away with as few changes to JSON::XS as possible, so I haven't touched this part yet. But perhaps you're right, it's not really needed, and dropping the non-core dependency would be an advantage. In fact, given that the actual Perl code in the module consists of exactly 10 assignments and a few uses, perhaps even the strict and warnings could be omitted.

      There are still some test issues.

      Yes, I saw those among the test reports. The problem is that simdjson returns floating point numbers as doubles, which is fine if Perl's NV happens to be a double, but not when -Duselongdouble or -Dusequadmath is in effect. It appears to me, based on the test results, that the -Duselongdouble case is mostly fine, because the failing tests merely betray a slight loss of precision. The intermittent garbage in the -Dusequadmath case is more worrying. I do recall a(n unanswered) bug report to JSON::XS with the exact same symptoms, so there might be deeper problems with this mode.

      In any case more care is warranted with number parsing, I'll try to do something about these failures.

      Thanks for the reports!

Re: Release: JSON::SIMD
by kcott (Archbishop) on Apr 19, 2023 at 06:39 UTC

    G'day kikuchiyo,

    Successfully installed on Cygwin running Perl v5.36.0. I was rather surprised that the CPAN Testers Matrix was updated almost immediately; anyway, there's further details there if you want them.

    From "JSON::SIMD vs JSON::XS":

    "At this time JSON::SIMD is not supported by JSON, JSON::MaybeXS or any other wrapper or compatibity modules, you have use it explicitly."

    That's a pity. I like to code "use JSON;"; then in POD, recommend installation of JSON::XS for the speed bonus. Are you intending to add such support in a future release?

    — Ken

      That's a pity. I like to code "use JSON;"; then in POD, recommend installation of JSON::XS for the speed bonus.

      It is a pity. I, OTOH, like to just use JSON::MaybeXS everywhere and then never have to worry about it. Once JSON::SIMD proves to be stable I would like to see it added into the list for JSON::MaybeXS at a higher priority than JSON::XS, which would then be the obvious fallback if jSON::SIMD isn't installed.


      🦛

        That's fair enough. I wasn't trying to push any particular JSON::* module; just reporting my typical usage.

        I suppose, with JSON, you could use the environment variable, $PERL_JSON_BACKEND, along these lines:

        export PERL_JSON_BACKEND=JSON::SIMD,JSON::XS,JSON::PP

        Although, I haven't tried that.

        — Ken

      I was rather surprised that the CPAN Testers Matrix was updated almost immediately

      I was surprised that it worked at all, for the last few weeks it threw errors for me whenever I tried to check anything. It appears to work right now, though.

      On the other hand, I've found that the fast-matrix view is indeed faster and it seemed to work even when the main one didn't, e.g. see here. Even now it shows more tests than the main view.

      Are you intending to add such support in a future release?

      I'd be all for it, but I'm afraid it's outside my jurisdiction. All I can do is raise the issue with the maintainers of these modules, and hope for the best.

        "... the CPAN Testers Matrix was updated almost immediately"
        "... I've found that the fast-matrix view is indeed faster ..."

        I may have misinterpreted your intent, but I wasn't referring to how quickly the page rendered.

        After successfully installing, probably just a minute or two later, I followed the Testers link purely out of curiosity. I didn't expect my report to be there; normally there's a delay of at least a few hours.

        — Ken

Re: Release: JSON::SIMD
by choroba (Cardinal) on Apr 19, 2023 at 22:53 UTC
    Do you plan to implement typing as in Cpanel::JSON::XS::Type?

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      No.

      To support that I'd essentially have to rebase the whole thing to Cpanel::JSON::XS. Which wouldn't be impossible, given that the interface between the simdjson wrapper and the XS part is relatively small and clean, but it would still be a lot of work (adapting the tests etc.).

Re: Release: JSON::SIMD
by kikuchiyo (Hermit) on Apr 18, 2023 at 20:14 UTC

    I'd be grateful for tips on building on OSX - the problem is that the gcc version bundled with the system is very old, so it apparently doesn't support C++11, which is an absolute requirement for simdjson. Linux with various perl versions down to 5.8, FreeBSD and even Windows with Strawberry Perl all seem to work.

    I'd also like to know how to set the repository and issue tracker links on Metacpan.

        The metadata for Metacpan is set via the META.json file.

        What got me curious is why the Makefile.PL didn't properly set the META.json , because the repo's 'Makefile.PL' sets the METAMERGE => { resources => { bugtracker => {...} } } , but the repo's 'META.json' shows resources: {} as empty.

        I've never had ExtUtils::MakeMaker do that to me, and 5 minutes of looking into it has seen me able to replicate the problem, but I don't see why it's not propagating from the Makefile.PL to META.json . Weird.

Re: Release: JSON::SIMD
by Anonymous Monk on Apr 18, 2023 at 23:44 UTC

    I'm interested to try this as process tonnes of 50MiB+ JSON files each day.

    Using latest strawberry perl (perl 5, version 32, subversion 1 (v5.32.1) built for MSWin32-x64-multi-thread) on windows it fails to compile:

      Thanks for the report!

      This is somewhat strange, because I've seen successful builds on Windows with the same versions of Strawberry Perl (I've set up a bunch of automated tests in the github repo).

      Out of curiosity, what versions of as and gcc do you use, on what kind of machine?

      In any case, one can find some stackoverflow questions and even a gcc bug report for the error message, I'll try one of the workarounds mentioned there.

        Thanks,

        gcc version 8.3.0 (x86_64-posix-seh, Built by strawberryperl.com project)

        CPU is i5-1135G7, windows 10

        I'll try again with a fresh copy of everything as I think this install has had a lot of strawberryperl updates, hmmm

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlnews [id://11151740]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-23 21:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found