Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

(OT) Perl and creating a query for MongoDB

by ovedpo15 (Pilgrim)
on Jul 30, 2019 at 11:10 UTC ( [id://11103616]=perlquestion: print w/replies, xml ) Need Help??

ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

I'm working with Perl and MongoDB but I have a direct question about querying so I hope someone here can help. I have created the following report with Perl:
{ "name": "test1", + "all_data": [ { "sub_data": [ { "sub_name": "Test1", "sub_path": "GROUP1/Test1", "info": [ { "group": "pkgs", "values": [ "tcsh" ] }, { "group": "tcsh", "values": [ "6.13.00" ] } ] }, { "sub_name": "GROUP2", "sub_path": "GROUP2", "info": [ { "group": "pkgs", "values": [ "tcsh" ] }, { "group": "tcsh", "values": [ "6.13.00" ] } ] }, ], "all_data_name": "ROOT", "all_data_path": "/PATH/TO/ROOT" } ], "username": "erwerwcsd", "timestamp": "1564475903" }
As you can see, I have all_data level which contains an array of objects that each one of them contains sub_data array and all_data_name and all_data_path fields.
The sub_data is an array of object where each one of them contains the sub_name, sub_path and info object.
My goal is to create a query which gets all reports with the name "test1" and the username "erwerwcsd" (I guess we need to use $match). Then I want to combine those reports in the following way:
Merge all reports (implementation could be different) into one main report and remove duplicates by the timestamp so only the late blocks will remain.
In other to explain it, I will use the following example: (I marked it as all_data<index> and sub_data<index>)

First report: (all_data<index>
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "ABC", version ="4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT
Second report:
all_data1: sub_data1: sub_name: sub2 sub_path: path/to/sub2 info: { group = "ABC", version = "1.5.6","4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT
Third report:
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "ABC", version = "1.5.6","4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT
Fourth report:
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "XYZ", version = "1.5.6","4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT
Fifth report:
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "XYZ", version = "1.5.6","4.2.1" } all_data_name: ROOT_OTHER all_data_path: /PATH/TO/ROOT
Then the merge will be as follows:
Merge of first and second: (Explanation: they have same all_data_name and all_data_path but not sub_name and sub_path)
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "ABC", version = "4.2.1" } sub_data2: sub_name: sub2 sub_path: path/to/sub2 info: { group = "ABC", version = "1.5.6","4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT
Merge of first and third: (Explanation: will be same as the first report because we take the latest. In that case they have same all_data, same sub_data and same info level)
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "ABC", version ="4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT
Merge of first and fourth: (Explanation: In that case they have same all_data, same sub_data and but not same info level)
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "XYZ", version = "1.5.6","4.2.1" },{ group = " +ABC", version ="4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT
Merge of first and fifth: (Explanation: they have different all_data_name)
all_data1: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "ABC", version ="4.2.1" } all_data_name: ROOT all_data_path: /PATH/TO/ROOT all_data2: sub_data1: sub_name: sub1 sub_path: path/to/sub1 info: { group = "XYZ", version = "1.5.6","4.2.1" } all_data_name: ROOT_OTHER all_data_path: /PATH/TO/ROOT
Because of the multi nesting It feels like not efficient to just iterate over each block and also I'm not sure which query operators I should use for that.
I'm looking for a way of combining those reports into one main report (at least just to understand the logic). I hope my question is readable and not so hard to understand (tried to show all possible cases).
Thank you.

EDIT: I understood that I should not do those operations from Mongo side and better to just get the needed reports and create the wanted report with Perl.
So I will get all the reports and put into a hash. Then I should iterate through the first and second arrays. Those arrays can be very big so I feel like it is not so efficient.
I would love to hear some suggestion on how to look at this problem, some interesting efficient way.
Thank you all!

Replies are listed 'Best First'.
Re: (OT) Perl and creating a query for MongoDB
by poj (Abbot) on Jul 30, 2019 at 12:21 UTC
    I have created the following report with Perl:

    Was that report created from a mongoDB database ?

    poj
      Thanks for the reply. It was not created from a MongoDB database, I created that report with a Perl script and uploaded it there. Now I have some reports like this in the MongoDB and I want to do the opposite operation - get the data back in a specific format. Just looking for a query suggestion so I can play with it in the MongoDB playground. It feels not so hard but I got stuck with it for some reason

        Are you asking how to do aggregation in mongoDB. If so, how does this involve perl ?.

        poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11103616]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-23 23:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found