Grouping an array of hashrefs by similar key values

perldj has asked for the wisdom of the Perl Monks concerning the following question:

I have created the following data structure by pushing the result of a database query as hashrefs into an array:

var1 = [
item_key => '1',
item_description => 'description 1',
item_summary => 'item summary 1',
action_key => '1',
action_summary => 'summary 1'
]

var2 = [
item_key => '1',
item_description => 'description 1',
item_summary => 'item summary 1',
action_key => '2',
action_summary => 'summary 2'
]

var3 = [
item_key => '2',
item_description => 'description 2',
item_summary => 'item summary 2',
action_key => '3',
action_summary => 'summary 3'
]
[download]

So essentially each row is the output of a query between a table of items and a table of actions, having a one to many relationship between them. What i'd like to do is group all items for each item into the same array index, approximately thus...

var1 = [
item_key => '1',
item_description => 'description 1',
item_summary => 'item summary 1',
action => [ key => '1',
            summary => 'summary 1'
          ],
          [ key => '2',
            summary => 'summary 2'
          ]
]

var2 = [
item_key => '2',
item_description => 'description 2',
item_summary => 'item summary 2',
action => [ key => '3',
            summary => 'summary 3'
          ]
]
[download]

How would this be possible?

Comment on Grouping an array of hashrefs by similar key values Select or Download Code

Replies are listed 'Best First'.
Re: Grouping an array of hashrefs by similar key values by ikegami (Patriarch) on Jan 28, 2009 at 23:52 UTC
First, your structure makes no sense. I think you mean `@items = ( { item_key => '1', item_description => 'description 1', item_summary => 'item summary 1', actions => [ { key => '1', summary => 'summary 1' }, { key => '2', summary => 'summary 2' } ] }, ... );` [download] You could do multiple queries. my $items_sth = $dbh->prepare('SELECT * FROM Items'); my $actions_sth = $dbh->prepare('SELECT * FROM Actions WHERE item_key += ?'); my @items; $items_sth->execute($item_row->{item_key}); while (my $item_row = $items_sth->fetch_hashref()) { my $item = { item_key => $item_row->{item_key}, item_description => $item_row->{item_description}, item_summary => $item_row->{item_summary}, actions => [], }; my $actions = $item->{actions}; $actions_sth->execute($item_row->{item_key}); while (my $action_row = $actions_sth->fetch_hashref()) { push @$actions, { key => $action_row->{action_key}, summary => $action_row->{action_summary}, }; } push @items, $item; } [download] You could order your query results by item_key and look at the last record key processed `my @items; my $last_key; while ($row = $sth->fetch_hashref()) { if (!@items \|\| $last_key != $row->{item_key}) { push @items, { item_key => $row->{item_key}, item_description => $row->{item_description}, item_summary => $row->{item_summary}, actions => [], }; $last_key = $row->{item_key}; } push @{ $results[-1]{action} }, { key => $row->{action_key}, summary => $row->{action_summary}, }; }` [download] You could use a lookup table `my %items; my @items; while ($row = $sth->fetch_hashref()) { our $item; local *item = \( $items{ $row->{item_key} } ); if (!$item) { $item = { item_key => $row->{item_key}, item_description => $row->{item_description}, item_summary => $row->{item_summary}, actions => [], }; push @items, $item; } push @{ $item->{action} }, { key => $row->{action_key}, summary => $row->{action_summary}, }; }` [download] This method requires more memory. Use the second method if ordering by item_key is acceptable. You could technically use the last two techniques to build the data structure you desire from the data structure you presented, if you can't build the data structure you desire directly from the database (as shown).	[reply] [d/l] [select]
Re^2: Grouping an array of hashrefs by similar key values by perldj (Initiate) on Jan 29, 2009 at 19:24 UTC
Thanks for the reply ikegami - very useful. I ended up using your third method in the end. May I ask what the 'local *item' actually means though? Other than that bit of code I understand what it does.	[reply]
Re^3: Grouping an array of hashrefs by similar key values by ikegami (Patriarch) on Jan 29, 2009 at 19:35 UTC
It aliases `$item` to `$items{ $row->{item_key} }`. Changes to one affect both. I could have done `$item_ref = \( $items{ $row->{item_key} } );`, but it made the subsequent code rather messy (but not as much as using neither).	[reply] [d/l] [select]
Re: Grouping an array of hashrefs by similar key values by kennethk (Abbot) on Jan 28, 2009 at 23:39 UTC
Your best bet would be to use a hash for your root element, keyed on item_key, and then populate it by crawling over your list, adding base elements if the key doesn't exist yet and pushing your actions onto the appropriate array if it does. Details on dealing with hashes of hashes (HoH) and their friends can be found in perllol (and if you need it, basic dereferencing is here). Happy to help debug once you've cobbled some code together.	[reply]
Re^2: Grouping an array of hashrefs by similar key values by perldj (Initiate) on Jan 28, 2009 at 23:49 UTC
Yes i already have some code together to perform this function and create a hash of hashes. The problem I have is when I return these results the records are sorted randomly, rather than in the sorted order as they arrived from the database (there's a date column I didn't include in my example, and each item has a date). Consequently I need to sort the hash of hashes in some way and have each record sorted by the value of the 'item_date' key's value.	[reply]
Re^3: Grouping an array of hashrefs by similar key values by kennethk (Abbot) on Jan 29, 2009 at 05:37 UTC
The result order isn't actually random - it follows a carefully designed algorithm to maximize the efficiency of hash look-ups. The solution to your problem is to use a sorting function to choose the output order. For example, assume your structure was named %hash and your item_date is in epoch time, thus numerically sortable. You could use the code `sub by_date { return $hash{$a}{item_date} <=> $hash{$b}{item_date}; } for my $key (sort by_date keys %hash) { print "$hash{$key}{item_key} $hash{$key}{item_description}\n" }` [download] to print out your database keys and descriptions in chronological order. I expect your dates are in a more complex format, and thus require a more complex sorting method. In order for your sorting function to work properly, it should return -1 if $a comes first, 1 if $b comes first and 0 if they are tied. perlop has a few more details.	[reply] [d/l]