sroux has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I would like to implement a group by feature in my program, this without having to export my array content in a file and process it in a database in order run a group by! I am not familiar with hashes yet as it looks like that it is the way to go. DBI could help also. Looking for a simple solution. This is a sample content of my array:
Account Entity Unit Jan Feb Mar Account01 Entity01 Unit01 1 2 3 Account01 Entity01 Unit01 4 5 6 Account01 Entity01 Unit01 7 8 9 Account02 Entity02 Unit02 10 11 12
And the expected result:
Account Entity Unit Jan Feb Mar Account01 Entity01 Unit01 12 15 18 Account02 Entity02 Unit02 10 11 12
This is what I run from my database:
CREATE TABLE concat_FRA3 AS SELECT Account, Entity, Unit, SUM(Jan) Jan, SUM(Feb) Feb, SUM(Mar) Mar FROM concat_FRA2 GROUP BY Account, Entity, Unit;
If found this piece of code but not clear on how to define the hash header/keys (Account to Mar) and columns to group (Jan to Mar)
# Store my %storage; foreach my $row (@$result_set) { my $hash_key = encode_hash_key(row); my $new_row = $row; if (exists $storage{$hash_key}) { $new_row = merge_rows($row, $storage{$hash_key}); } $storage{$hash_key} = $new_row; }
Thank you for any hint or advice. Sebastien

Replies are listed 'Best First'.
Re: Group by from array
by Corion (Patriarch) on Sep 01, 2022 at 08:13 UTC

    If you already feel comfortable with SQL, you can also use an SQLite virtual table to treat your Perl data:

    our $aref = $result_set; $dbh->do(<<''); CREATE VIRTUAL TABLE my_table USING perl(Account, Entity ,Unit, arrayrefs="main::aref") my $elements = $dbh->selectall_arrayref(<<'SQL', undef, {}); SELECT Account, Entity, Unit, SUM(Jan) Jan, SUM(Feb) Feb, SUM(Mar) Mar FROM my_table GROUP BY Account, Entity, Unit; SQL ...

Re: Group by from array
by tybalt89 (Monsignor) on Sep 01, 2022 at 09:48 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11146593 use warnings; use List::AllUtils qw( zip_by ); my %group_by; my @titles = split ' ', <DATA>; while( <DATA> ) { my @parts = split; my %hash = zip_by { @_ } \@titles, \@parts; for ( qw( Jan Feb Mar ) ) { $group_by{ join ' ', @hash{ qw( Account Entity Unit ) } }{ $_ } += + $hash{ $_ }; } } use Data::Dump 'dd'; dd \%group_by; __DATA__ Account Entity Unit Jan Feb Mar Account01 Entity01 Unit01 1 2 3 Account01 Entity01 Unit01 4 5 6 Account01 Entity01 Unit01 7 8 9 Account02 Entity02 Unit02 10 11 12

    Outputs:

    { "Account01 Entity01 Unit01" => { Feb => 15, Jan => 12, Mar => 18 }, "Account02 Entity02 Unit02" => { Feb => 11, Jan => 10, Mar => 12 }, }
      Does

      • my %hash = zip_by { @_ } \@titles, \@parts;

      have any other effect than

      • my %hash; @hash{ @titles } = @parts;

      ?

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        It's also slower.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Thank you both for your contribution, indeed if I can avoid using a specific module that could be much better. This is what I tried below without success yet considering the following result: { "  " => { Feb => 0, Jan => 0, Mar => 0 } }
        @mapped = ("Account Entity Unit Jan Feb Mar", "Account01 Entity01 Unit01 1 2 3", "Account01 Entity01 Unit01 4 5 6", "Account01 Entity01 Unit01 7 8 9", "Account02 Entity02 Unit02 10 11 12"); my %group_by; my @titles = split ' ', @mapped; foreach (@mapped) { #print $_."\n"; my @parts = split; #print @parts."\n"; my %hash; @hash{ @titles } = @parts; #don't get it for(qw(Jan Feb Mar)){ $group_by{join ' ', @hash{qw(Account Entity Unit)}}{ $_ } +=+ +$hash{ $_ }; #don't get it either } } use Data::Dump 'dd'; dd \%group_by;
Re: Group by from array
by erix (Prior) on Sep 02, 2022 at 12:44 UTC

    I would like to implement a group by feature in my program, this without having to export my array content in a file and process it in a database in order run a group by

    Another way to avoid the file copying/-naming/-reading , and still enjoy full SQL power, is to copy your csv/tsv data into the clipboard (either via xclip of even simply Ctrl-A/C) and have a perl program sniff that clipboard-data, and read out the (clipboarded) csv data into a (perhaps temporary or foreign) table in a database (first generating a CREATE TABLE from the header line). This works well generically, even for pretty large data chunks, and takes almost no time.

Re: Group by from array
by Marshall (Canon) on Sep 02, 2022 at 09:33 UTC
    This is one way to do it without a hash table. Your grouping constraints can be enforced by sorting the table. then roll through the table doing sums and printing every time the grouping changes.

    use strict; use warnings; use Data::Dump qw(pp); use constant { "AcctName" => 0, "EntityName" => 1, "UnitName" => 2}; my @array = ( ['Account01', 'Entity01', 'Unit01', 1, 2, +3], ['Account01', 'Entity01', 'Unit01', 4, 5, +6], ['Account02', 'Entity02', 'Unit02', 10, 11, + 12], ['Account01', 'Entity01', 'Unit01', 7, 8, +9] ); @array = sort {$a->[AcctName] cmp $b->[AcctName] or $a->[EntityName] cmp $b->[EntityName] or $a->[UnitName] cmp $b->[UnitName] }@array; pp \@array; die "input array too small!!" unless @array>=2; my $firstRowRef = shift @array; my @startRow = @$firstRowRef; foreach my $row_ref (@array) { my @curRow = @$row_ref; if ($curRow[0] ne $startRow[0] or $curRow[1] ne $startRow[1] or $curRow[2] ne $startRow[2] ) { print "@startRow\n"; # new grouping @startRow = @curRow; } else # group continuation { # add current month #'s to totals $startRow[3]+=$curRow[3]; $startRow[4]+=$curRow[4]; $startRow[5]+=$curRow[5]; } } print "@startRow\n"; # This is for the last row of array __END__ sorted array is: [ ["Account01", "Entity01", "Unit01", 1, 2, 3], ["Account01", "Entity01", "Unit01", 4, 5, 6], ["Account01", "Entity01", "Unit01", 7, 8, 9], ["Account02", "Entity02", "Unit02", 10, 11, 12], ] Account01 Entity01 Unit01 12 15 18 Account02 Entity02 Unit02 10 11 12
Re: Group by from array
by tybalt89 (Monsignor) on Sep 02, 2022 at 15:21 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11146593 use warnings; use List::AllUtils qw( zip_by sum ); my @mapped = ("Account Entity Unit Jan Feb Mar", "Account01 Entity01 Unit01 1 2 3", "Account01 Entity01 Unit01 4 5 6", "Account01 Entity01 Unit01 7 8 9", "Account02 Entity02 Unit02 10 11 12"); local $_ = join '', map "$_\n", @mapped; 1 while s/^((?:\S+\s+){3})\b\K(.*\n)((?:.*\n)*)\1(.*\n)/ join(' ', zip_by { sum @_ } map [ split ], $2, $4) . "\n$3"/me; s/(.*\n)\K/ $1 =~ tr| \n|-|cr /e; printf scalar( ('%-11s' x 6 . "\n") x tr/\n// ), split;

    Outputs:

    Account Entity Unit Jan Feb Mar ------- ------ ---- --- --- --- Account01 Entity01 Unit01 12 15 18 Account02 Entity02 Unit02 10 11 12