rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have this problem which i could prolly do by using heaps of foreach loops, but am wondering if there is a quicker, easier and cleaner way to do it?

I would like to be able to create a tally of matching arrays. The problem is prolly easier if I demonstrate...
I have two hashes of arrays that look thus:
# an array of terms used to create hash1 my @terms = ( 'male','female','child' ); # hash1 of term arrays my %terms = (); $terms{1} = [ 'male' ]; $terms{2} = [ 'female' ]; $terms{3} = [ 'child' ]; $terms{4} = [ 'male', 'female' ]; $terms{5} = [ 'male', 'child' ]; $terms{6} = [ 'female', 'child' ]; $terms{7} = [ 'male', 'female', 'child' ]; # hash2 of term occurrences my %occurs = (); $occurs{111} = [ 'male' ]; $occurs{112} = [ 'male' ]; $occurs{113} = [ 'male', 'child' ]; $occurs{114} = [ 'female' ]; $occurs{115} = [ 'child', 'female' ];

My objective is to come out with one hash that says how often each "occurrence of a combination of terms" from hash1, occurred in hash2. For example:
my %tally = (); # Male on its own appeared twice $tally{hash1_terms1} = 2; # Female on its own appeared once $tally{hash1_terms2} = 1; # Child on its own appeared twice $tally{hash1_terms3} = 0; # and so on...

I am hoping that someone out there has more logic than me and can see an easy way to do this. Any help would be appreciated.

Cheers,
Reagen

Replies are listed 'Best First'.
Re: matching arrays
by Enlil (Parson) on May 25, 2004 at 21:25 UTC
    Making a whole heap of assumptions (e.g that those three terms are the only ones that will appear, that they will appear only once in a particular array, that regardless of order they arrays are the same, etc).
    use strict; use warnings; my @occurances = ( [ 'male' ], [ 'male' ], [ 'male', 'child' ], [ 'female' ], [ 'child', 'female' ], [ 'female', 'child' ], [ 'female', 'male', 'child' ], ); my @terms = ( [ 'male' ], [ 'female' ], [ 'child' ], [ 'male', 'female' ], [ 'male', 'child' ], [ 'female', 'child' ], [ 'male', 'female', 'child' ], ); my %tally; foreach my $array_ref ( @occurances ) { $tally{stringify(@$array_ref)}++; } foreach my $array_ref ( @terms ) { if ( defined $tally{stringify(@$array_ref)}) { print "Terms:" . join (" and ", @$array_ref) . " were found " . $tally{stringify(@$array_ref)} . " times\n" } else { print "Terms:" . join (" and ", @$array_ref) . " were found 0 times\n" } } sub stringify { join ("+", sort {$b cmp $a} @_) }
    Note that I did switch your hashes for arrays, as it made more sense to me to have them this way.

    -enlil

      Thanks for the reply.

      The @occurrences is actually a hash of arrray's.
      push(@{$occurrences{$ID}}, "$term1"); push(@{$occurrences{$ID}}, "$term2");
      Do you know of an easier way rather than running your code within a foreach loop?
      foreach (keys (%occurrences)) { (@occurrences) = @{$occurrences{$_}}[ # Do the code you posted... } # end-foreach
      thanks enlil. I've made the few changes needed for it to fit into my script and it works like a charm.

      Cheers,
      Reagen
Re: matching arrays
by Fletch (Bishop) on May 25, 2004 at 21:12 UTC

    Make a bit vector, each bit corresponding to a search term (bit 0 means 'male', bit 1 'female'). Convert your occurences into bit vectors as well and use those as keys in your tally ([ 'male', 'child'] would have bits 0 and 2 set (0b101)). Then turn the bit vector keys back into the original terms on the way out.

    Update: I was going to expand on this but connectivity on the train was kinda spotty. At any rate, see perldoc -f vec and search for Set modules on CPAN (since that's basically what you're asking about).

      bit vectors are something new to me. I'll do some research and check them out. Cheers.
        This code is a simple version of the bit vector code:

        my %bits = ('male' => 1, 'female' => 2, 'child' => 4); my %tally; foreach my $array_ref ( @occurances ) { my $vector = 0; $vector += $bits{$_} for @$array_ref; $tally{$vector}++; }
        So, if each occurrence can have each of 'male', 'female', etc. only once, the vector for each permutation is unique. Then get the vector for each possible occurrence, and see how many times each appeared in the %tally hash.

        (++ to all the replies in this thread, because at this stage in the morning, I couldn't work out what the OP wanted at all ;) )
Re: matching arrays
by Plankton (Vicar) on May 25, 2004 at 21:09 UTC
    You probably have a good reason for what you are doing ... but why not use a database?

    Plankton: 1% Evil, 99% Hot Gas.
      I am pulling this all out of a database of several hundred thousand records for a specific user. my thinking is that it will be a lot quicker to form some arrays of 100 or so in length than to do mysql selects on the whole database :)
        A good SQL query can save you a lot of work.

        Plankton: 1% Evil, 99% Hot Gas.
Re: matching arrays
by BrowserUk (Patriarch) on May 25, 2004 at 21:17 UTC

    The first question is why are you using a hash with apparently consecutive numeric keys instead of an array?

    The second is does [ 'child', 'male' ] match [ 'male', 'child' ]?

    What is the significance of the keys for %occurs?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      good point. midnight in vienna and i've been working all day :)

      Yep, they should match regardless of the order they appear in.