pwagyi has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks!
Let's say I have an array of object. and also assume that there are some attributes (attr1, attr2, ...attrn) in each object.
I would like to have function/class that can count over any attributes' values. e.g if object is person, and person has attributes, sex ('Male','Female'), ethnic ('a','b','c',...), occupation ('programmer','hacker',...).
If I want to find count# on sex =>
count( array_of_objects, 'sex') => returns
( 'Male' => 45 #male count, 'Female' => 54)
But in some cases, I would also need to group,
count(array, ['sex','occupation']) =>
( 'Male' => ( 'programmer' => 10, 'blah' => 34),
'Female' => ('blah' => 3, 'programmer'=>4)
How could I have generic function/class to group and count over collection?
Re: SQL like query over iterable in perl (updated x2)
by haukex (Archbishop) on Mar 02, 2017 at 07:25 UTC
|
my @array_of_objects = (
{ sex=>'Male', occupation=>'foo' },
{ sex=>'Female', occupation=>'foo' },
{ sex=>'Male', occupation=>'bar' },
{ sex=>'Female', occupation=>'bar' },
{ sex=>'Female', occupation=>'foo' },
);
my %counts;
$counts{ $_->{sex} }++ for @array_of_objects;
my %grouped;
$grouped{ $_->{sex} }{ $_->{occupation} }++ for @array_of_objects;
use Data::Dumper;
print Dumper( \%counts, \%grouped );
__END__
$VAR1 = { 'Female' => 3, 'Male' => 2 };
$VAR2 = { 'Female' => { 'bar' => 1, 'foo' => 2 },
'Male' => { 'foo' => 1, 'bar' => 1 } };
Update: Personally, I'd prefer the above, but if you really want a generic function, here's one option. Probably not the most efficient solution because it's recursive, I don't think the morning caffeine has fully kicked in yet ;-) Input and output is the same as above. (Update: huck's solution, posted before the below, is the non-recursive variation of this.)
sub count {
my ($data, $fields) = @_;
$fields = [$fields] unless ref $fields;
my $count = {};
_dive( $count, $_, @$fields ) for @$data;
return $count;
}
sub _dive {
my ($ref, $obj, @path) = @_;
my $targ = \$ref->{ $obj->{ shift @path } };
if (!@path) { $$targ++; return $ref }
$$targ = _dive( $$targ, $obj, @path );
}
print Dumper( count(\@array_of_objects, 'sex') );
print Dumper( count(\@array_of_objects,
['sex','occupation']) );
Update 2: In the above, I'm working with hash references instead of objects. If you want to use real objects and method calls, then in the first example, replace { $_->{sex} }{ $_->{occupation} } with { $_->sex }{ $_->occupation }, and in the second example, replace $obj->{ shift @path } with $obj->${\shift @path}. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: SQL like query over iterable in perl
by Athanasius (Archbishop) on Mar 02, 2017 at 07:33 UTC
|
Hello pwagyi, and welcome to the Monastery!
Although this is no doubt doable in OO, why reinvent the wheel? If you want SQL-like queries, use a real relational database that supports real SQL! For example, the module DBD::SQLite:
...includes the entire thing in the distribution. So in order to get a fast transaction capable RDBMS working for your perl project you simply have to install this module, and nothing else.
Hope that helps,
| [reply] [Watch: Dir/Any] |
Re: SQL like query over iterable in perl
by huck (Prior) on Mar 02, 2017 at 07:45 UTC
|
More generic than haukex's, more pure-perl than Athanasius's, more functional than OO
#!/usr/bin/perl -w
use strict; use warnings;
my $ar=[
{sex=>'m',occupation=>'programmer',state=>'CA'},
{sex=>'m',occupation=>'blah1',state=>'CA'},
{sex=>'m',occupation=>'blah1',state=>'NY'},
{sex=>'m',occupation=>'blah2',state=>'NY'},
{sex=>'f',occupation=>'blah1',state=>'NY'},
{sex=>'f',occupation=>'blah2',state=>'NJ'},
];
printer(counter($ar,['sex']));
printer(counter($ar,['sex','state']));
printer(counter($ar,['sex','occupation']));
printer(counter($ar,[]));
exit;
sub printer{
use Data::Dumper;
my $ref=shift;
local $Data::Dumper::Deepcopy=1;
local $Data::Dumper::Purity=1;
local $Data::Dumper::Sortkeys=1;
local $Data::Dumper::Indent=2;
print Dumper($ref)."\n";
} # printer
sub counter {
my $arin=shift;
my $arparts=shift;
my $res={};
my $size1=scalar(@$arparts)-1;
return $res unless ($size1 >=0);
my $size2=$size1-1;
for my $hash (@$arin){
my @parts=();
for my $key (@$arparts){
my $val=$hash->{$key};
if (!defined ($val)) {$val='.'}
push @parts,$val;
}
my $reshead=$res;
for my $n (0..$size2) {
my $val=$parts[$n];
unless ($reshead->{$val}) {$reshead->{$val}={};}
$reshead=$reshead->{$val};
}
$reshead->{$parts[$size1]}++;
} # hash
return $res;
} # counter
Result
$VAR1 = {
'f' => 2,
'm' => 4
};
$VAR1 = {
'f' => {
'NJ' => 1,
'NY' => 1
},
'm' => {
'CA' => 2,
'NY' => 2
}
};
$VAR1 = {
'f' => {
'blah1' => 1,
'blah2' => 1
},
'm' => {
'blah1' => 2,
'blah2' => 1,
'programmer' => 1
}
};
$VAR1 = {};
Added:
printer(counter($ar,['state','sex','occupation']));
Result
$VAR1 = {
'CA' => {
'm' => {
'blah1' => 1,
'programmer' => 1
}
},
'NJ' => {
'f' => {
'blah2' => 1
}
},
'NY' => {
'f' => {
'blah1' => 1
},
'm' => {
'blah1' => 1,
'blah2' => 1
}
}
};
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
More generic than haukex's, more pure-perl than Athanasius's, more functional than OO
Shorter than huck's ;-P
sub count {
my ($data, $fields) = @_;
$fields = [$fields] unless ref $fields;
my $count = {};
for my $obj (@$data) {
my $ref = $count;
$ref = $ref->{ $obj->{ $fields->[$_] } } //= {}
for 0..$#$fields-1;
$ref->{ $obj->{ $fields->[-1] } }++;
}
return $count;
}
(For input, output and other context see my other node)
| [reply] [Watch: Dir/Any] [d/l] |
|
Cute
Made me think $fields = [$fields,@_] unless ref fields;
and not that it matters here, and i would have to look up funny debug commands to tell for sure, but are $#$fields-1 and $fields->[-1] optimized or does the subtraction happen every loop of $obj? Im never sure.
and in mine, besides $data,$fields,and $count as nicer variable names, i had already decided i should have $size1 be named as $nterminal and $size2 named as $nsubparts to make it clearer what they did. wasnt gun'a post as a "fix" but this gave me an excuse to mention it.
Edited to add: What about missing fields? $ref = $ref->{ $obj->{ $fields->[$_] // '.' } } //= {} where '.' is used like the SAS missing variable. That wasnt right, instead $ref = $ref->{ $obj->{ $fields->[$_] } // '.' } //= {}
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
|