Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

SQL like query over iterable in perl

by pwagyi (Monk)
on Mar 02, 2017 at 07:00 UTC ( [id://1183352]=perlquestion: print w/replies, xml ) Need Help??

pwagyi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

Let's say I have an array of object. and also assume that there are some attributes (attr1, attr2, ...attrn) in each object.

I would like to have function/class that can count over any attributes' values. e.g if object is person, and person has attributes, sex ('Male','Female'), ethnic ('a','b','c',...), occupation ('programmer','hacker',...).

If I want to find count# on sex =>
count( array_of_objects, 'sex') => returns ( 'Male' => 45 #male count, 'Female' => 54)
But in some cases, I would also need to group,
count(array, ['sex','occupation']) => ( 'Male' => ( 'programmer' => 10, 'blah' => 34), 'Female' => ('blah' => 3, 'programmer'=>4)
How could I have generic function/class to group and count over collection?

Replies are listed 'Best First'.
Re: SQL like query over iterable in perl (updated x2)
by haukex (Archbishop) on Mar 02, 2017 at 07:25 UTC

    Sounds to me like foreach (and perhaps map) and hashes will help you:

    my @array_of_objects = ( { sex=>'Male', occupation=>'foo' }, { sex=>'Female', occupation=>'foo' }, { sex=>'Male', occupation=>'bar' }, { sex=>'Female', occupation=>'bar' }, { sex=>'Female', occupation=>'foo' }, ); my %counts; $counts{ $_->{sex} }++ for @array_of_objects; my %grouped; $grouped{ $_->{sex} }{ $_->{occupation} }++ for @array_of_objects; use Data::Dumper; print Dumper( \%counts, \%grouped ); __END__ $VAR1 = { 'Female' => 3, 'Male' => 2 }; $VAR2 = { 'Female' => { 'bar' => 1, 'foo' => 2 }, 'Male' => { 'foo' => 1, 'bar' => 1 } };

    Update: Personally, I'd prefer the above, but if you really want a generic function, here's one option. Probably not the most efficient solution because it's recursive, I don't think the morning caffeine has fully kicked in yet ;-) Input and output is the same as above. (Update: huck's solution, posted before the below, is the non-recursive variation of this.)

    sub count { my ($data, $fields) = @_; $fields = [$fields] unless ref $fields; my $count = {}; _dive( $count, $_, @$fields ) for @$data; return $count; } sub _dive { my ($ref, $obj, @path) = @_; my $targ = \$ref->{ $obj->{ shift @path } }; if (!@path) { $$targ++; return $ref } $$targ = _dive( $$targ, $obj, @path ); } print Dumper( count(\@array_of_objects, 'sex') ); print Dumper( count(\@array_of_objects, ['sex','occupation']) );

    Update 2: In the above, I'm working with hash references instead of objects. If you want to use real objects and method calls, then in the first example, replace { $_->{sex} }{ $_->{occupation} } with { $_->sex }{ $_->occupation }, and in the second example, replace $obj->{ shift @path } with $obj->${\shift @path}.

Re: SQL like query over iterable in perl
by Athanasius (Archbishop) on Mar 02, 2017 at 07:33 UTC

    Hello pwagyi, and welcome to the Monastery!

    Although this is no doubt doable in OO, why reinvent the wheel? If you want SQL-like queries, use a real relational database that supports real SQL! For example, the module DBD::SQLite:

    ...includes the entire thing in the distribution. So in order to get a fast transaction capable RDBMS working for your perl project you simply have to install this module, and nothing else.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: SQL like query over iterable in perl
by huck (Prior) on Mar 02, 2017 at 07:45 UTC

    More generic than haukex's, more pure-perl than Athanasius's, more functional than OO

    #!/usr/bin/perl -w use strict; use warnings; my $ar=[ {sex=>'m',occupation=>'programmer',state=>'CA'}, {sex=>'m',occupation=>'blah1',state=>'CA'}, {sex=>'m',occupation=>'blah1',state=>'NY'}, {sex=>'m',occupation=>'blah2',state=>'NY'}, {sex=>'f',occupation=>'blah1',state=>'NY'}, {sex=>'f',occupation=>'blah2',state=>'NJ'}, ]; printer(counter($ar,['sex'])); printer(counter($ar,['sex','state'])); printer(counter($ar,['sex','occupation'])); printer(counter($ar,[])); exit; sub printer{ use Data::Dumper; my $ref=shift; local $Data::Dumper::Deepcopy=1; local $Data::Dumper::Purity=1; local $Data::Dumper::Sortkeys=1; local $Data::Dumper::Indent=2; print Dumper($ref)."\n"; } # printer sub counter { my $arin=shift; my $arparts=shift; my $res={}; my $size1=scalar(@$arparts)-1; return $res unless ($size1 >=0); my $size2=$size1-1; for my $hash (@$arin){ my @parts=(); for my $key (@$arparts){ my $val=$hash->{$key}; if (!defined ($val)) {$val='.'} push @parts,$val; } my $reshead=$res; for my $n (0..$size2) { my $val=$parts[$n]; unless ($reshead->{$val}) {$reshead->{$val}={};} $reshead=$reshead->{$val}; } $reshead->{$parts[$size1]}++; } # hash return $res; } # counter
    Result
    $VAR1 = { 'f' => 2, 'm' => 4 }; $VAR1 = { 'f' => { 'NJ' => 1, 'NY' => 1 }, 'm' => { 'CA' => 2, 'NY' => 2 } }; $VAR1 = { 'f' => { 'blah1' => 1, 'blah2' => 1 }, 'm' => { 'blah1' => 2, 'blah2' => 1, 'programmer' => 1 } }; $VAR1 = {};

    Added:

    printer(counter($ar,['state','sex','occupation']));
    Result
    $VAR1 = { 'CA' => { 'm' => { 'blah1' => 1, 'programmer' => 1 } }, 'NJ' => { 'f' => { 'blah2' => 1 } }, 'NY' => { 'f' => { 'blah1' => 1 }, 'm' => { 'blah1' => 1, 'blah2' => 1 } } };

      More generic than haukex's, more pure-perl than Athanasius's, more functional than OO

      Shorter than huck's ;-P

      sub count { my ($data, $fields) = @_; $fields = [$fields] unless ref $fields; my $count = {}; for my $obj (@$data) { my $ref = $count; $ref = $ref->{ $obj->{ $fields->[$_] } } //= {} for 0..$#$fields-1; $ref->{ $obj->{ $fields->[-1] } }++; } return $count; }

      (For input, output and other context see my other node)

        Cute

        Made me think $fields = [$fields,@_] unless ref fields;

        and not that it matters here, and i would have to look up funny debug commands to tell for sure, but are $#$fields-1 and $fields->[-1] optimized or does the subtraction happen every loop of $obj? Im never sure.

        and in mine, besides $data,$fields,and $count as nicer variable names, i had already decided i should have $size1 be named as $nterminal and $size2 named as $nsubparts to make it clearer what they did. wasnt gun'a post as a "fix" but this gave me an excuse to mention it.

        Edited to add: What about missing fields? $ref = $ref->{ $obj->{ $fields->[$_] // '.' } } //= {} where '.' is used like the SAS missing variable.
        That wasnt right, instead $ref = $ref->{ $obj->{ $fields->[$_] } // '.'  } //= {}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1183352]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-29 14:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found