Trying to make the code more clear and clean

ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

I'm tyring to create the following format of hash:

    {                                      
        "test": [          
            {                              
                "group": "ABC1",            
                "values": [
                    "abc",                
                    "xyz"
                ]                          
            },                             
            {                              
                "group": "ABC2",            
                "values": [
                    "abc2",                
                ]                            
            },                             
            {                              
                "group": "ABC3",            
                "values": [            
                    "xyz3"
                ]                             
            }                              
        ]                                  
    }
[download]

I iterate over my data the perform the following code on each line:

    open(my $fh, '<', "$file_path") or return 0;
    while (my $line = <$fh>) {
    
        # ...........
        # ...........
        # Additional operations of the lines of the file
        # ...........
        # ...........
        
        
        my %a; # temp hash
        $a{"group"} = $group;
        if (defined($href->{"values"})) {
            $a{"values"} = [sort(uniq($value,@{$href->{"values"}}))];
        } else {
            push(@{$a{"values"}},$value);
        }
        push(@{$href->{"test"}},\%a);
    }
[download]

I don't like how it looks, it is not clear and a bit mess.
Is there a way to make the code shorter and maybe without using the temp hash and if-else block?
EDIT: I found out that it actually won't do what I need - it will not push the value in the same array of the group. Not sure how to fix it yet.
EDIT-2: I'm sorry if my question was not understandable. I though about it and from the comments I leraned that I need to clerify my question (sorry again). I have talked about the inner structure but actually the final structure which I need is:

{                                                                     
+           
    "root": [                                  
        {                                      
            "test": [          
                {                              
                    "group": "XYZ",            
                    "values": [              
                        "1234"                 
                    ]                          
                },                             
                {                              
                    "group": "ABC",            
                    "values": [              
                        "6.13.00"              
                    ]                          
                }                              
            ]                                  
        },                                     
        {                                      
            "test": [          
                {                              
                    "group": "XYZ",            
                    "values": [              
                        "tcsh"                 
                    ]                          
                },                             
                {                              
                    "tool": "WEA",            
                    "values": [              
                        "6.13.00"              
                    ]                          
                }                              
            ]                                  
        },                                     
        {                                      
            "test": [          
                {                              
                    "group": "BAB",            
                    "values": [              
                        "ASDAS",
                        "12312321"
                    ]                          
                },                             
                {                              
                    "group": "SADA",            
                    "values": [              
                        "6.13.00",
                        "1231231"
                    ]                          
                }
            ]
        }
    ]
}
[download]

I want to iterate though a file of data, parse it and extract group and values. The currect way I use:

sub parse_file {
    my ($file_path,$href) = @_;

    open(my $fh, '<', "$file_path") or return 0;
    while (my $line = <$fh>) {
        chomp($line);
        unless ($line =~ /\A[^,]+(?:,[^,]+){5}\z/) {
            next;
        }

        my ($key,$group,$value,$version,$file,$count) = split(/,/,$lin
+e);

        push @{ $href->{test} }, {
            group  => $group,
            values => [ sort uniq $value, @{ $href->{values} // [] } ]
        };
    }
    close ($fh);

    return 1;
}

foreach my $dir (sort(@list_of_dirs)) {
    my ($all,%data);
    $all = $dir."/"."galish";
    prase_results_raw($all,\%data);
    push(@all_data,\%data);
}
[download]

It does what I need when there are no multiple values for the same group.
For multiple values for the same group I'll get:

{                                                                     
+           
    "root": [                                  
        {                                      
            "test": [          
                {                              
                    "group": "XYZ",            
                    "values": [              
                        "1234"                 
                    ]                          
                },                             
                {                              
                    "group": "ABC",            
                    "values": [              
                        "6.13.00"              
                    ]                          
                }                              
            ]                                  
        },                                     
        {                                      
            "test": [          
                {                              
                    "group": "XYZ",            
                    "values": [              
                        "tcsh"                 
                    ]                          
                },                             
                {                              
                    "tool": "WEA",            
                    "values": [              
                        "6.13.00"              
                    ]                          
                }                              
            ]                                  
        },                                     
        {                                      
            "test": [          
                {                              
                    "group": "BAB",            
                    "values": [              
                        "ASDAS",
                    ]                          
                },
                {                              
                    "group": "BAB",            
                    "values": [              
                        "12312321"
                    ]                          
                },                 
                {                              
                    "group": "SADA",            
                    "values": [              
                        "6.13.00",
                    ]                          
                }
                {                              
                    "group": "SADA",            
                    "values": [              
                        "1231231"
                    ]                          
                }
            ]
        }
    ]
}
[download]

If you take a look closely, you'll see duplicates of the same group name in the same test.
How to solve it?

Comment on Trying to make the code more clear and clean Select or Download Code

Replies are listed 'Best First'.
Re: Trying to make the code more clear and clean by davido (Cardinal) on Jul 29, 2019 at 15:49 UTC
It's possible I missed some nuance but I think you can totally eliminate the temporary hash, and the `if` conditional if you just guard a single array-dereference point against definedness: `#!usr/bin/env perl use strict; use warnings; use List::Util 'uniq'; sub do_thing { my ($file_path, $href) = @_; open my $fh, '<', $file_path or return 0; while (my $line = <$fh>) { # stuff my $group = 'some setup for this line.'; my $value = 'some setup for this line.'; push @{$href->{test}}, { group => $group, values => [ $value, sort uniq(@{$href->{values} // []}), # This will be em +pty if the definedness test would have been false. ], }; } }` [download] For me I like to ask myself if the temporary variable is adding clarity or not. In this case I didn't think it was, and in fact after removing it, it made it more clear to me that the `if` conditional was overkill. Instead of the conditional, we simply guard against `$href->{values}` not being defined. If it is undefined, we provide an empty array reference in `@{$href->{values} // []}`. The functions `sort` and `uniq` are not going to complain about being handed an empty list. And your conditional was really only dealing with the possibility that `$href->{values}` was undefined, or that the list was empty. We've handled both of those cases here. If your code can be made to avoid conditional branches, it's usually going to be easier to follow. And often (but not always), temporary variables increase the amount of state that one has to keep in mind. I am wondering, though, why `$values` isn't being passed through the `uniq` filter. Is it desired behavior to possibly add a duplicate but then filter it out later? Update: EDIT: I found out that it actually won't do what I need - it will not push the value in the same array of the group. Not sure how to fix it yet. EDIT-2: I'm sorry if my question was not understandable. I though about it and from the comments I leraned that I need to clerify my question (sorry again). I have talked about the inner structure but actually the final structure which I need is: Now the question really is hard to follow, and hard to answer. Dave	[reply] [d/l] [select]
Re: Trying to make the code more clear and clean by AnomalousMonk (Archbishop) on Jul 29, 2019 at 16:15 UTC
LanX has pointed out that what is shown in the OP is almost JSON. It would be useful to know exactly what the format of the input data file is. Is it in fact almost-JSON? Is it something that could easily be converted to strict JSON? Is it something that could easily be converted to a Perl expression string and fed to eval (Perl is more forgiving of some things than JSON)? A more simple and/or elegant solution might emerge from more detail. Update: Some wording changes/enhancements. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re: Trying to make the code more clear and clean by 1nickt (Canon) on Jul 29, 2019 at 14:25 UTC
Hi, often you can skip a `defined` check to decide whether or not you can dereference something, without autovivifying it if it doesn't exist, by supplying a default. Note use of the defined-or operator `//`. `$ perl -MData::Dumper -Mstrict -wE 'my $foo = {}; say for @{ $foo->{ba +r} }; say Dumper $foo' $VAR1 = { 'bar' => [] };` [download] `$ perl -MData::Dumper -Mstrict -wE 'my $foo = {}; say for @{ $foo->{ba +r} // [] }; say Dumper $foo' $VAR1 = {};` [download] This should work if I understand your partial code sample: `open(my $fh, '<', "$file_path") or return 0; my $href = {}; while (my $line = <$fh>) { # ........... # ........... # Additional operations of the lines of the file # ........... # ........... push @{ $href->{test} }, { group => $group, values => [ sort uniq $value, @{ $href->{values} // [] } ] +, }; }` [download] (...although it seems a bit odd to have a top-level key with values that are then added to the sub-hashes. Could you build the combined list when you use the contents of `$href`?) Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^2: Trying to make the code more clear and clean by ovedpo15 (Pilgrim) on Jul 29, 2019 at 22:16 UTC
Thanks all for the replies! My question was not understandable enough so I have edited it. please take a look if you can, thanks, and sorry again!	[reply]
Re: Trying to make the code more clear and clean by LanX (Saint) on Jul 29, 2019 at 15:09 UTC
are you aware that the data dump you are showing is not Perl, but rather kind of JSON? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]
Re^2: Trying to make the code more clear and clean by 1nickt (Canon) on Jul 29, 2019 at 16:41 UTC
Maybe the OP set `$Data::Dumper::Pair`. Maybe it's a sample of the JSON s/he wants to get. The way forward always starts with a minimal test.	[reply] [d/l]
Re: Trying to make the code more clear and clean by BillKSmith (Monsignor) on Jul 29, 2019 at 16:20 UTC
This duplicates my understanding of your code without the temporary hash. `use strict; use warnings; use List::Util qw(uniq); use Data::Dumper; my $group = q(ABC1); my $value = q(xyz); my $href = { values=> [qw(abc xyz abc)], }; push @{$href->{test}}, {group=>$group}; $href->{test}[-1]{values} = ( defined $href->{"values"} ) ? [sort(uniq($value,@{$href->{"values"}}))] : $value ; print Dumper $href;` [download] Bill	[reply] [d/l]
Re: Trying to make the code more clear and clean by hippo (Archbishop) on Jul 29, 2019 at 14:06 UTC
Is there a way to make the code shorter Sure, just remove all the double-quotes. None of those in the excerpt you've shown are required.	[reply]
Re^2: Trying to make the code more clear and clean by ovedpo15 (Pilgrim) on Jul 29, 2019 at 14:10 UTC
I'll make them constans, left them as strings so it will be more readable for the readers of the thread.	[reply]
Re^3: Trying to make the code more clear and clean by hippo (Archbishop) on Jul 29, 2019 at 14:34 UTC
Personally I find that `open(my $fh, '<', "$file_path") or return 0;` [download] is much less readable than `open(my $fh, '<', $file_path) or return 0;` [download] With the former one is forced to ponder why the author has enclosed nothing more than a single scalar in double-quotes. Perhaps `$file_path` isn't really a scalar but is instead some path object requiring stringification. So then we have to go looking for the declaration of `$file_path` and assignments to it, etc. ... only to find that it's a just a plain string after all and the double-quotes are superfluous. Editing is an art for sure but cutting for clarity is one of the easiest tasks.	[reply] [d/l] [select]
Re^3: Trying to make the code more clear and clean by 1nickt (Canon) on Jul 29, 2019 at 14:32 UTC
Nothing to do with constants. They are always strings, just quoted or not. You don't need to quote them when in use as hash key names. The way forward always starts with a minimal test.	[reply]
Re: Trying to make the code more clear and clean by bliako (Abbot) on Jul 29, 2019 at 22:53 UTC
If you take a look closely, you'll see duplicates of the same group name in the same test. How to solve it? Because you keep pushing them in without checking if that group exists already: `push @{ $href->{test} }, { group => $group, values => [ sort uniq $value, @{ $href->{values} // [] } ] };` [download] Also, this keeps sorting every time you insert a new value: `$a{"values"} = [sort(uniq($value,@{$href->{"values"}}))];` [download] `uniq()` can perhaps be eliminated if you used a hash instead of the array. And you can do the sorting once, after you have read/inserted your data, as a second stage processing.	[reply] [d/l] [select]
Re^2: Trying to make the code more clear and clean by ovedpo15 (Pilgrim) on Jul 29, 2019 at 23:01 UTC
Thank you for the fast reply. I wanted to use hash at first so the key is group and the value is an array of values. But the problem is that this format should be of JSON that will be sent to Mongo and Mongo does not allow dots and the dollar sign to be in the key name. But we want to allow those special chars to be in groups so we had to rechange the structure and they can't be keys. I can't see any other structure than the one I suggested in the post. Anyway, I know that I keep pushing it each time, my question here is how I can make sure it gets to the right place? I'll have to iterate over the array and that does not feel efficient. Maybe you can suggest how the structure should look like? Thanks again!	[reply]
Re: Trying to make the code more clear and clean by 1nickt (Canon) on Jul 29, 2019 at 23:11 UTC
Hi, try changing your output structure, yes your output structure, to a much more manageable hash of hashes (of hashes of arrays). If you want to be able easily to pull out data later by group or by test, you should only use an array for the innermost values. The, um, values, haha. Anyway, you don't need all those layers of key names. If the object is of test results, you don't need to name it "test". Just get in there and index the tests. If a test hash can only have values by group, key the results by group name. It's much more natural and way easier to build. Something like this: `{ + "root": { 1 => { "XYZ": [ "1234" ], "ABC": [ "6.13.00" ] }, 2 => { "BAB": [ "ASDAS", "12312321" ], "SADA": [ "6.13.00", "1231231" ] } ] }` [download] Then you can change `push @{ $href->{test} }, { group => $group, values => [ sort uniq $value, @{ $href->{values} // [] } ] };` [download] to `my $test_num = 0; for my $file ( @test_files ) { $test_num++; ... for my $line (<FILE>) { push @{ $href->{ $test_num }{ $group } }, $value; } }` [download] Later when you reuse the data to build a report or whatever, you can give it column headers without having to store them in the data object itself. Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^2: Trying to make the code more clear and clean by ovedpo15 (Pilgrim) on Jul 29, 2019 at 23:29 UTC
Thanks for the detailed answer! We can't use groups/values as a key of the hashes because we send this JSON format to MongoDB. But MongoDB can't handle with dots or dollar signs in the key section so it will fail (we want to allow those chars). Also we want users to user those reports so we need them to be readable and indexing the keys feels less readable (at least to me).	[reply]
Re^3: Trying to make the code more clear and clean by 1nickt (Canon) on Jul 30, 2019 at 03:29 UTC
I see. OK, that answers one question, but it is not clear from what you have posted so far what constitutes a "test". Is a 'test" the same as a "file"? You still haven't posted any sample data with the expected output from that data. That would be helpful in trying to understand your situation. Also, are you aware of how fragile it is to try to handle comma-separated values manually by just splitting? Use Text::CSV! (In fact you could encapsulate this whole program into a `Text::CSV` after_parse callback, but that's a different story...) Assuming that one file is one test, testing one href, and that you do not want multiple hashes each keyed with 'test', but just one, with one listing for each group, maybe you could use something like: use strict; use warnings; use feature 'say'; use JSON; my $results = {}; for my $href ('root') { my $by_group = {}; while (my $line = <DATA>) { my ( $key, $group, $value, $version, $file, $count ) = split(/ +,/, $line); push @{ $by_group->{ $group } }, $value; } my $test = []; for my $group ( keys %{ $by_group } ) { push @{ $test }, { group => $group, values => $by_group->{ $gr +oup } } } $results->{ $href } = $test; } say JSON->new->pretty->canonical->encode($results); __DATA__ bla,ABC,6.13.00,bla,bla,bla bla,XYZ,1234,bla,bla,bla bla,XYZ,tcsh,bla,bla,bla bla,WEA,6.13.00,bla,bla,bla bla,BAB,ASDAS,bla,bla,bla bla,BAB,12312321,bla,bla,bla bla,SADA,6.13.00,bla,bla,bla bla,SADA,12312321,bla,bla,bla [download] (^^ that's an SSCCE ...) `$ perl oved.pl { "root" : [ { "group" : "BAB", "values" : [ "ASDAS", "12312321" ] }, { "group" : "ABC", "values" : [ "6.13.00" ] }, { "group" : "XYZ", "values" : [ "1234", "tcsh" ] }, { "group" : "WEA", "values" : [ "6.13.00" ] }, { "group" : "SADA", "values" : [ "6.13.00", "12312321" ] } ] }` [download] Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l] [select]