gabecz has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I have 2 JSON files, better if I show the first. Two objects of it anyways:

[ { "externalId": "52787", "watchers": ["3348", "3639"] }, { "externalId": "52803", "watchers": ["2778"] } ]

...and about 250.000 more of these.

The other JSON is a more complicated but it has objects with matching "externalId" keys. Meaning if there is an externalId in one JSON there is one in the other and vice versa, and no "watchers" is to find in the second JSON.

Basically what I want to achieve is to add the "watchers" array to its matching object in the second JSON. My powershell script looks like will take forever and a day, hence I'm looking for a perl script to do the trick.

To be more clear, the second - more complex - JSON is unfortunately missing the "watchers" data which is in a separate file and I'd like to combine the two files into one.

Thanks so much for the suggestions.

Replies are listed 'Best First'.
Re: Combine 2 JSON files
by choroba (Cardinal) on Sep 25, 2024 at 07:39 UTC
    You didn't provide the second JSON, so I had to hallucinate one myself. The comment shows where you need to adjust your code.

    The basic idea is to index the first json by the external id using a hash.

    #!/usr/bin/perl use warnings; use strict; use experimental qw( signatures ); use Cpanel::JSON::XS qw{ encode_json decode_json }; use List::Util qw{ shuffle }; sub create_json1($name, $size) { my @watchers; for my $i (shuffle(1 .. $size)) { push @watchers, { externalId => $i, watchers => [map "$_", map 1 + int rand $size, 0 .. ra +nd 4] }; } open my $out, '>', $name or die "$name: $!"; print {$out} encode_json(\@watchers); } sub create_json2($name, $size) { my @list; for my $i (shuffle(1 .. $size)) { push @list, { id => $i }; } open my $out, '>', $name or die "$name: $!"; print {$out} encode_json(\@list); } sub merge ($name1, $name2) { open my $in1, '<', $name1 or die "$name1: $!"; my $struct1 = decode_json(do { local $/; <$in1> }); open my $in2, '<', $name2 or die "$name2: $!"; my $struct2 = decode_json(do { local $/; <$in2> }); my %by_id = map { $_->{externalId}, $_->{watchers} } @$struct1; for my $member (@$struct2) { # This depends on the structure of the 2nd file. if (exists $by_id{ $member->{id} }) { $member->{watchers} = $by_id{ $member->{id} }; } else { warn "$member->{id} not found in $name1"; } } print encode_json($struct2); } my $SIZE = 100_000; create_json1('1.json', $SIZE); create_json2('2.json', $SIZE); merge('1.json', '2.json');

    Runs under 2 seconds on my old machine.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Combine 2 JSON files
by Corion (Patriarch) on Sep 25, 2024 at 07:30 UTC

    In Perl (and likely in Powershell too), you will want to put the shorter of the two lists into a hash ("Hashtable" in Powershell diction). I'm using the JSON::Tiny module, but there are others in the JSON namespace:

    #!perl use 5.020; use JSON::Tiny 'decode_json', 'encode_json'; sub read_json { my ($filename) = @_; open my $fh, '<', $filename or die "Couldn't read '$filename': $!"; return decode_json( do { local $/; <$fh> } ); # sluurp } my $first_json_file = 'first.json'; my $second_json_file = 'second.json'; my $data_first = read_json( $first_json_file ); my $data_second = read_json( $second_json_file ); my %watchers; for my $w ($data_first->@* ) { $watchers{ $w->{ externalId } } = $w->{ watchers }; } for my $o ($data_second->@* ) { my $id = $o->{externalId}; if( ! $watchers{ $id } ) { warn "Second item '$id' does not have watchers?!"; } else { $o->{ watchers } = $watchers{ $id }; } }; say json_encode( $data_second );
Re: Combine 2 JSON files
by GrandFather (Saint) on Sep 25, 2024 at 07:26 UTC

    You will get better answers if you show us a "before and after" of the input and output files. Wordy descriptions are likely not to be accurate nor accurately understood.

    Note that PerlMonks isn't a code writing service. We would much rather help you improve your own code and learn in the process than simply write the code for you. Perhaps you could do enough work to at least sketch out a solution yourself? Note that CPAN is your friend for tasks like this.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Combine 2 JSON files
by Marshall (Canon) on Sep 25, 2024 at 07:34 UTC
    As a general approach, I would read the first JSON file and construct a Hash of Array where externalId is the hash key and the value is a reference to array of the watchers values.

    Then read the 2nd file one record at a time and look up the watcher data to add to each one from the hash table.

    A hash table with 250,000 keys is a "doable thing". The size of the 2nd file doesn't matter since you can read a record, add watchers, then output the record.

    Hope this helps.

Re: Combine 2 JSON files
by ikegami (Patriarch) on Sep 25, 2024 at 12:16 UTC

    You want something like

    use Cpanel::JSON::XS qw( decode_json encode_json ); use File::Slurper qw( read_binary ); my $watchers_data = decode_json( read_binary( $ARGV[0] ) ); my %watchers_lkup = map { $_->{ externalId } => $_->{ watchers } } @$watchers_data; my $data = decode_json( read_binary( $ARGV[1] ) ); $_->{ watchers } = $watchers_lkup{ $_->{ externalId } } for @$data; print encode_json( $data );

    Using jq:

    jq -n ' ( input | map( { key: .externalId, value: .watchers } ) | from_entries ) as $watchers_lkup | inputs | map( .watchers = $watchers_lkup[ .externalId ] ) ' watchers.json file.json

    Demo on jqplay.

Re: Combine 2 JSON files
by cavac (Prior) on Sep 28, 2024 at 19:32 UTC

    The other JSON is a more complicated but it has objects with matching "externalId" keys.

    Is this a one-time thing, our is it a growing collection of data?

    If it is the latter, i recommend a couple of tables in a database. Just add the new data to the db, and export as needed.

    With a proper example of BOTH files WITH the matching IDs, i could probably cobble together a simple example of scripts.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
    Also check out my sisters artwork and my weekly webcomics
Re: Combine 2 JSON files
by gabecz (Initiate) on Sep 26, 2024 at 07:21 UTC
    Thank you all for the suggestions and code samples to be honest with perl I'm like with spanish or dutch. I can understand, and if I have enough 'samples' I can modify to my needs but can't come up with anything all by myself. Yet. But I'm learning. I can't learn by reading syntax codes or theory, I learn by seeing tons of codes. That's how I learned php, sql, c, c#, ps, bash, and a few others. None on very high level though. Once again, great inputs from all of you thanks! I'll be able to put together the code that does the magic now Gabe