package Array::GroupBy; $VERSION = '0.0.1'; use strict; use warnings; use Carp; use List::Util qw( min max sum ); use Moose; use Moose::Util::TypeConstraints; # These get old value first, then added value my %code_for; { no warnings 'uninitialized'; %code_for = ( sum => { itemproc => sub { shift() + shift() } }, min => { itemproc => sub { defined $_[0] ? min( shift, shift ) : $_[1] } }, max => { itemproc => sub { defined $_[0] ? max( shift, shift ) : $_[1] } }, avg => { listproc => sub { sum(@_) / scalar @_ } }, count => { itemproc => sub { 1 + shift } }, ); } subtype 'Array::GroupBy::TC::GrouperHow' => as 'HashRef' => where { 0 == grep { !defined || !( # Allow a bare code reference ref $_ eq 'CODE' # allow a hash ref with listproc/itemproc => code ref || ( ref $_ eq ref {} && ( ref $_->{listproc} eq 'CODE' || ref $_->{itemproc} eq 'CODE' ) ) # Allow the name of an item in %code_for || ( ref $_ eq q{} && exists $code_for{ lc $_ } ) ) } values %{$_}; }; has 'group_by' => ( is => 'rw', isa => 'ArrayRef', required => 1, trigger => sub { croak 'empty group_by' if ! @{ $_[1] } }, ); has 'group_how' => ( is => 'rw', isa => 'Array::GroupBy::TC::GrouperHow', trigger => \&_normalize_methods, ); # This is called when group_how is set. # It turns every value into a hash ref with listproc/itemproc sub _normalize_methods { my ( $self, $group_spec ) = @_; foreach my $method ( values %{$group_spec} ) { if ( ref $method eq q{} && exists $code_for{ lc $method } ) { $method = $code_for{ lc $method }; } elsif ( ref $method eq 'CODE' ) { $method = { itemproc => $method }; } } return; } sub group { my $self = shift; my @group_keys = @{ $self->group_by() }; # This is a H(oH)* where each level is a key from group_by # e.g., When group_by = [ 'first', 'last' ] # $grouped{'fred'}{'flintstone'} is a reference to a record with # those keys' aggregates my %grouped; # These are the output records. They're also in the leaves of %grouped my @out; foreach my $item (@_) { # walk out to the right leaf for this record my $leaf = \%grouped; foreach my $gk (@group_keys) { $leaf = ( $leaf->{$item->{$gk}} ||= {} ); } # if this is a new key set, put the new agg. record in @out if ( !keys %{$leaf} ) { push @out, $leaf; } $self->_summarize( $leaf, $item ); } # Apply each list processor to their respective fields my %group_how = %{ $self->group_how() }; FIELD: foreach my $field ( keys %group_how ) { my $listproc = $group_how{$field}->{listproc}; next FIELD if !$listproc; foreach my $group (@out) { $group->{$field} = $listproc->( @{ $group->{$field} || [] } ); } } return @out; } sub _summarize { my ( $self, $leaf, $item ) = @_; my %group_how = %{ $self->group_how() }; my %group_by = map { $_ => 1 } @{ $self->group_by() }; # fields in either the time or the %group_how my @fields = keys %{{ map { $_ => 1 } keys %{$item}, keys %group_how }}; foreach my $field ( @fields ) { if ( exists $group_by{$field} ) { $leaf->{$field} ||= $item->{$field}; } elsif ( !exists $group_how{$field} ) { $leaf->{$field}++; } elsif ( exists $group_how{$field}->{itemproc} ) { my $proc = $group_how{$field}->{itemproc}; $leaf->{$field} = $proc->( $leaf->{$field}, $item->{$field} ); } elsif ( exists $group_how{$field}->{listproc} ) { push @{ $leaf->{$field} }, $item->{$field}; } else { croak 'programmer error: impossible condition'; } } return; } 1; # Magic true value required at end of module __END__ =head1 NAME Array::GroupBy - Group an array of records by some key (as with SQL GROUP BY) =head1 VERSION This document describes Array::GroupBy version 0.0.1 =head1 SYNOPSIS use Array::GroupBy; my $grouper = Array::GroupBy->new( group_by => [ 'key1', 'key2' ], group_how => { field1 => { listproc => \&list_aggregator }, field2 => 'sum', field3 => \&item_aggregator, field4 => { itemproc => \&item_aggregator }, }, ); my @grouped = $grouper->group( @AoH ); =head1 DESCRIPTION Use Array::GroupBy to turn a list of records into records that have their values aggregated according to some function, grouped by some key. This is meant to be similar to how the "GROUP BY" feature in SQL operates. This documentation assumes some familiarity with that feature. =head1 METHODS =head2 group This is an instance method that takes a list of hashes and returns a list of hashes with the field data grouped according to the configuration in the object's attributes. =head2 group_how This is an accessor for the grouping definition. With no arguments, this returns the grouping definition (a hash reference). With a hash reference argument, this sets the grouping definition. B that the definition you set is normalized, so it may not come back out the same way it went in. =head3 Format The format of C is a hash where each key is a field name from the input and the keys values define how that field is to be aggregated. The aggregation value can be one of: =over =item A code reference This will be called for each record in the input. It's given two arguments, the old value of the field, and the value to be added into the aggregate. The return value will be stored as the aggregate value. The aggregator should be ready to get an undef value for either argument. The first call will always have undef as the first argument. The second argument could be undef if that's the value of some field in the input. As an example, this would select the longest field value: sub { no warnings 'uninitialized'; if ( defined $_[0] && length $_[0] > length $_[1] ) { return $_[0]; } else { return $_[1]; } } =item The name of a prewritten aggregator These aggregators are already written and can be referenced by name. =over =item B Sum of the input values. =item B Minimum numeric value of the field. =item B Maximum numeric value of the field. =item B The average (arithmetic mean) of input values. =item B The number of records with the group_by key fields. =back =item A hash reference with a code reference This aggregator definition looks like this: { listproc => \&code_reference } This is to allow an aggregator that runs over the whole list of field values found rather than aggregating them one at a time. In this case, Array::GroupBy will build up an array of all the field values and pass them to the aggregator for each group found after scanning the input records. You can also specify an item-by-item processor this way. { itemproc => \&item_processor } =back =head3 Defaults The default behavior for any field that's not in the C attribute and does not have any aggregator defined is 'count'. Every aggregator is called for every record. If you define an aggregator for a field that does not appear in any record, it will get C values on every call, and the field I appear in the output records. This can be a convenient way of getting a "count(*)" field. my $grouper = Array::GroupBy->new( group_by => [ 'name' ], group_how => { counter => 'count' } ); That aggregator would work similarly to this SQL query: SELECT name,count(*) AS counter FROM t GROUP BY name; =head2 group_by This is an accessor for the list of keys used for grouping. With no arguments, this returns a reference to an array of keys. Call it with a reference to an array of strings to set the attribute. =head2 meta This is a method added by Moose which provides access to the current class's metaclass. =head1 DIAGNOSTICS =over =item empty group_by This is the error you get if you try to set the group_by attribute to an empty array. =item Attribute (I) does not pass the type constraint This error is generated by Moose, so it may vary with different versions of Moose. This is the error you get when you try to set attribute I to a value that's not allowed by the type constraints on it. =back =head1 DEPENDENCIES =over =item Moose I used Moose 0.17 when writing this. I expect any later version should work. =back Every other module I used is a core module. =head1 BUGS AND LIMITATIONS No bugs have been reported. Please report any bugs or feature requests to C, or through the web interface at L. =over =item There's no way to aggregate one field multiple ways For example, if you want the min I max of some field, the interface doesn't allow a simple way to get them both. =item If you catch the exception thrown by a mutator, the object may be in an odd state. I'd expect attributes to hold their previous values when setting fails this way, but apparently they don't. This behavior is dependent on Moose, so maybe it's different in later versions. =back =head1 AUTHOR Kyle Hasselbacher C<< >> =head1 LICENCE AND COPYRIGHT Copyright (c) 2009, Kyle Hasselbacher C<< >>. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See L. =head1 DISCLAIMER OF WARRANTY BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.