JSON::Path and node iteration

frazap has asked for the wisdom of the Perl Monks concerning the following question:

I have the following json data

{"status":"ok","message-type":"work-list","message-version":"1.0.0","m
+essage":{"facets":{},"total-results":1,"items":[{"author":[{"given":"
+Brahim","family":"Oubaha","sequence":"first","affiliation":[{"name":"
+Laboratory of Biology and Biotechnology of Microorganisms Faculty of 
+Sciences Semlalia Cadi Ayyad University Marrakech Morocco"},{"name":"
+Laboratory of Interaction Plant\u2010Microorganisms, Department of Bi
+ology University of Fribourg Fribourg Switzerland"}]},{"given":"Ahmed
+","family":"Nafis","sequence":"additional","affiliation":[{"name":"La
+boratory of Biology and Biotechnology of Microorganisms Faculty of Sc
+iences Semlalia Cadi Ayyad University Marrakech Morocco"}]},{"given":
+"Mohamed","family":"Baz","sequence":"additional","affiliation":[{"nam
+e":"Laboratory of Biology and Biotechnology of Microorganisms Faculty
+ of Sciences Semlalia Cadi Ayyad University Marrakech Morocco"}]},{"g
+iven":"Felix","family":"Mauch","sequence":"additional","affiliation":
+[{"name":"Laboratory of Interaction Plant\u2010Microorganisms, Depart
+ment of Biology University of Fribourg Fribourg Switzerland"}]},{"ORC
+ID":"http:\/\/orcid.org\/0000-0001-7949-5592","authenticated-orcid":f
+alse,"given":"Mustapha","family":"Barakate","sequence":"additional","
+affiliation":[{"name":"Laboratory of Biology and Biotechnology of Mic
+roorganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrak
+ech Morocco"},{"name":"Agrobiosciences &amp; Fertilizers Program Univ
+ersity Mohammed IV Polytechnic (UM6P) Benguerir Morocco"}]}],"title":
+["The potential of antagonistic moroccan\n            Streptomyces\n 
+           isolates for the biological control of damping\u2010off di
+sease of pea (\n            Pisum sativum\n            L.) caused by\
+n            Aphanomyces euteiches"]}],"items-per-page":20,"query":{"
+start-index":0,"search-terms":null}}}
[download]

That comes from the CrossRef metadata repository

I want to extract the names and affiliations with the format name, firstname; affiliation1 - affiliation2 - ...

I have the following code that works

sub unfoldauthors {
    my ($data_ar) = @_;
    return unless ($data_ar);
    $data_ar = $data_ar->[0];

    # die Dumper $data_ar;
    my @paths = qw( $..family $..given $..affiliation );
    my @selectors;
    for my $p (@paths) {
        push @selectors, JSON::Path->new($p);
    }
    my $aff_name = JSON::Path->new('$..name');
    my $name;

    my @lines;
    my @sep = ( ", ", "; ", " " );
    my $col = 0;
    for my $s (@selectors) {

        my @text = $s->values($data_ar);
        #print $col, " ", Dumper(@text), "\n";
        my $authors_limit = 50;
        my $last          = $authors_limit - 1;
        if ( @text > $authors_limit ) {
            @text = @text[ 0 .. $last ];
            $text[$last] .= " ..." if ( $col == 1 );
        }
        my $pos = 0;
        for my $name (@text) {
            #print "$col $name\n";
            if ( $col == 2 ) {
                my @aff = $aff_name->values( $text[$pos] );
                #print "aff:", Dumper( @aff ), "\n";
                $name = join( " - ", @aff );
                #print "\$name $name\n";
            }
            $lines[ $pos++ ] .= $name . $sep[$col];
        }
        $col++;
    }
    return \@lines;
}
[download]

I first iterate over the nodes at the "upper level" family, given, affiliation, and for the affiliation array ref, I extract the name values.

My question: could I have done this more directly in one step ?

Thanks

François

Here is a complete working example

use strict;
use warnings;
use open qw<:std :encoding(UTF-8) >;
use Data::Dumper;
use REST::Client::CrossRef;
use Log::Any::Adapter( 'File', './log_auth.txt', "log_level" => "info"
+ );
use JSON::Path;

my $cr = REST::Client::CrossRef->new(
    mailto        => 'd...h',
    spit_raw_data => 0,
    add_end_flag  => 1,
    json_path     => [ ['$.title'], ['$.author'], ],
    json_path_callback => { '$.author' => \&unfoldauthors },
);

$cr->init_cache(
    {   BasePath => "C:\\Windows\\Temp\\perl",
        NoUpdate => 15 * 60,
        verbose  => 0
    }
);

sub unfoldauthors {
    my ($data_ar) = @_;
    return unless ($data_ar);
    $data_ar = $data_ar->[0];
    my @paths = qw( $..family $..given $..affiliation );
    my @selectors;
    for my $p (@paths) {
        push @selectors, JSON::Path->new($p);
    }
    my $aff_name = JSON::Path->new('$..name');
    my $name;

    my @lines;
    my @sep = ( ", ", "; ", " " );
    my $col = 0;
    for my $s (@selectors) {
        my @text = $s->values($data_ar);
        my $authors_limit = 50;
        my $last          = $authors_limit - 1;
        if ( @text > $authors_limit ) {
            @text = @text[ 0 .. $last ];
            $text[$last] .= " ..." if ( $col == 1 );
        }
        my $pos = 0;
        for my $name (@text) {
            print "$col $name\n";
            if ( $col == 2 ) {
                my @aff = $aff_name->values( $text[$pos] );
                $name = join( " - ", @aff );
            }
            $lines[ $pos++ ] .= $name . $sep[$col];
        }
        $col++;
    }
    return \@lines;
}

$cr->init_cache(
    {   BasePath => "C:\\Windows\\Temp\\perl",
        NoUpdate => 15 * 60,
        verbose  => 0
    }
);
my $select = "author,title";
while (<DATA>) {
    chomp;
    my $data =
        $cr->works_from_doi( $_, { 'has-affiliation' => 'true' }, $sel
+ect );
    next unless ($data);
    for my $row (@$data) {
        print "\n" unless ($row);
        while ( my ( $f, $v ) = each %$row ) {
            if ( $f eq '$.title' ) {
                print "***$v\n\n";
            }
            else {
                print "$v \n";
            }

        }
    }
}
__DATA__
10.1111/jph.12775
[download]

Comment on JSON::Path and node iteration Select or Download Code

Replies are listed 'Best First'.
Re: JSON::Path and node iteration by haukex (Archbishop) on Mar 21, 2019 at 15:31 UTC
You don't say what your expected output for that input is, but based on running your code against your sample data, it seems that you want only the first author (`$data_ar = $data_ar->[0]`), with the return value of `unfoldauthors` in this format (Perl): `["Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorga +nisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Mo +rocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depart +ment of Biology University of Fribourg Fribourg Switzerland "]` [download] Your stated requirement seems fairly straightforward, and so I'm not really sure what all the code in `for my $s (@selectors)` is doing - maybe you could explain that? Here, I've written `unfoldauthors2` to give you just the first author, and `unfoldauthors3` to give you all authors: use warnings; use strict; use Data::Dump; use JSON::Path; use JSON; my $data = decode_json(<<'ENDJSON'); {"status":"ok","message-type":"work-list","message-version":"1.0.0", "message":{"facets":{},"total-results":1,"items": [ {"author":[ {"given":"Brahim","family":"Oubaha","sequence":"first","affili +ation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"}, {"name":"Laboratory of Interaction Plant\u2010Microorganis +ms, Department of Biology University of Fribourg Fribourg Switzerland +"} ]}, {"given":"Ahmed","family":"Nafis","sequence":"additional","aff +iliation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"} ]}, {"given":"Mohamed","family":"Baz","sequence":"additional","aff +iliation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"} ]}, {"given":"Felix","family":"Mauch","sequence":"additional","aff +iliation":[ {"name":"Laboratory of Interaction Plant\u2010Microorganis +ms, Department of Biology University of Fribourg Fribourg Switzerland +"}]}, {"given":"Mustapha","family":"Barakate","ORCID":"http:\/\/orci +d.org\/0000-0001-7949-5592","authenticated-orcid":false,"sequence":"a +dditional","affiliation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"}, {"name":"Agrobiosciences & Fertilizers Program Univers +ity Mohammed IV Polytechnic (UM6P) Benguerir Morocco"}]} ],"title":["The potential of antagonistic moroccan\n St +reptomyces\n isolates for the biological control of dampin +g\u2010off disease of pea (\n Pisum sativum\n L +.) caused by\n Aphanomyces euteiches"]} ], "items-per-page":20,"query":{"start-index":0,"search-terms":null}}} ENDJSON dd unfoldauthors($data->{message}{items}[0]{author}); dd unfoldauthors2($data->{message}{items}[0]{author}); dd unfoldauthors3($data->{message}{items}[0]{author}); sub unfoldauthors { my ($data_ar) = @_; return unless ($data_ar); $data_ar = $data_ar->[0]; my @paths = qw( $..family $..given $..affiliation ); my @selectors; for my $p (@paths) { push @selectors, JSON::Path->new($p); } my $aff_name = JSON::Path->new('$..name'); my $name; my @lines; my @sep = ( ", ", "; ", " " ); my $col = 0; for my $s (@selectors) { my @text = $s->values($data_ar); my $authors_limit = 50; my $last = $authors_limit - 1; if ( @text > $authors_limit ) { @text = @text[ 0 .. $last ]; $text[$last] .= " ..." if ( $col == 1 ); } my $pos = 0; for my $name (@text) { #print "$col $name\n"; if ( $col == 2 ) { my @aff = $aff_name->values( $text[$pos] ); $name = join( " - ", @aff ); } $lines[ $pos++ ] .= $name . $sep[$col]; } $col++; } return \@lines; } sub unfoldauthors2 { my ($authors) = @_; return [ $authors->[0]{family}.', '.$authors->[0]{given}.'; '. join(' - ', map {$_->{name}} @{$authors->[0]{affiliation}}) ]; } sub unfoldauthors3 { my ($authors) = @_; return [ map { $_->{family}.', '.$_->{given}.'; '.join(' - ', map {$_->{name}} @{ $_->{affiliation} }) } @$authors ]; } __END__ [ "Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorg +anisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech M +orocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depar +tment of Biology University of Fribourg Fribourg Switzerland ", ] [ "Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorg +anisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech M +orocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depar +tment of Biology University of Fribourg Fribourg Switzerland", ] [ "Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorg +anisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech M +orocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depar +tment of Biology University of Fribourg Fribourg Switzerland", "Nafis, Ahmed; Laboratory of Biology and Biotechnology of Microorgan +isms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Mor +occo", "Baz, Mohamed; Laboratory of Biology and Biotechnology of Microorgan +isms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Mor +occo", "Mauch, Felix; Laboratory of Interaction Plant\x{2010}Microorganisms +, Department of Biology University of Fribourg Fribourg Switzerland", "Barakate, Mustapha; Laboratory of Biology and Biotechnology of Micr +oorganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrake +ch Morocco - Agrobiosciences & Fertilizers Program University Moh +ammed IV Polytechnic (UM6P) Benguerir Morocco", ] [download]	[reply] [d/l] [select]
Re^2: JSON::Path and node iteration by frazap (Monk) on Mar 21, 2019 at 15:46 UTC
Thanks for helping! I want to extract the names and affiliations with the format `name, firstname; affiliation1 - affiliation2 - ...` I wanted all the authors, not the first. The unfoldauthors code take only the first 50, but I should have removed these line. I think that my values are in the first element of the array because of the selector `$.author` here `my $cr = REST::Client::CrossRef->new( mailto => 'd...h', spit_raw_data => 0, add_end_flag => 1, json_path => [ ['$.title'], ['$.author'], ], json_path_callback => { '$.author' => \&unfoldauthors }, );` [download] The `unfoldauthors` function is a callback pass to the `$cr` object. The two json_path arg are used to extract data from the whole json hashref. The data extracted with the second argument is given to the callback function. This is the output (from the complete example) I'm having now, which is ok, but I'm asking if my callback function could have been made simpler: ***The potential of antagonistic moroccan Streptomyces isolates for the biological control of dampingÔÇÉoff disea +se of pea ( Pisum sativum L.) caused by Aphanomyces euteiches Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorgani +sms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Moro +cco - Laboratory of Interaction PlantÔÇÉMicroorganisms, Department of + Biology University of Fribourg Fribourg Switzerland Nafis, Ahmed; Laboratory of Biology and Biotechnology of Microorganism +s Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Morocc +o Baz, Mohamed; Laboratory of Biology and Biotechnology of Microorganism +s Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Morocc +o Mauch, Felix; Laboratory of Interaction PlantÔÇÉMicroorganisms, Depart +ment of Biology University of Fribourg Fribourg Switzerland Barakate, Mustapha; Laboratory of Biology and Biotechnology of Microor +ganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech +Morocco - Agrobiosciences & Fertilizers Program University Mohamm +ed IV Polytechnic (UM6P) Benguerir Morocco [download]	[reply] [d/l] [select]
Re^3: JSON::Path and node iteration (updated) by haukex (Archbishop) on Mar 21, 2019 at 15:51 UTC
I think that my values are in the first element of the array because of the selector `$.author` here Ah yes, I didn't play around with JSON::Path because I haven't used that yet (and it seemed like it might be a little overkill in this case). Anyway, I think then my `unfoldauthors3` should do what you want? (Update: The same `$data_ar = $data_ar->[0];` step will probably be necessary with `$authors`.) Update 2: I took a look at the source of REST::Client::CrossRef, and it seems like that's the code that wraps the argument of the callback in an extra arrayref. In `unfoldauthors3`, you could probably just say `@{$authors->[0]}` instead of `@$authors`.	[reply] [d/l] [select]
Re^4: JSON::Path and node iteration (updated) by frazap (Monk) on Mar 22, 2019 at 06:53 UTC
Re^5: JSON::Path and node iteration (updated) by hdb (Monsignor) on Mar 22, 2019 at 14:15 UTC
Some notes below your chosen depth have not been shown here