Re^3: Contextual/categorical Histogram

Performing an intersection, xor (symmetric difference), or join (union) operation on two histograms is fairly straightforward:

#! perl
use strict;
use warnings;
use Data::Dump 'pp';

my %hist1 = ( a => 2, b => 5, c => 7, );
my %hist2 = ( b => 3, c => 1, d => 4, );

my %union = %hist1;
$union{$_} += $hist2{$_} for keys %hist2;

my %inter;
for (keys %hist1)
{
    if (exists $hist2{$_})
    {
        my $val1 = $hist1{$_};
        my $val2 = $hist2{$_};
        $inter{$_} = ($val1 <= $val2) ? $val1 : $val2;
    }
}

my %xor;
exists $hist2{$_} || ($xor{$_} = $hist1{$_}) for keys %hist1;
exists $hist1{$_} || ($xor{$_} = $hist2{$_}) for keys %hist2;

print "Histogram 1:  ", pp(\%hist1), "\n";
print "Histogram 2:  ", pp(\%hist2), "\n";
print "Union:        ", pp(\%union), "\n";
print "Intersection: ", pp(\%inter), "\n";
print "XOR:          ", pp(\%xor),   "\n";
[download]

Output:

14:17 >perl 958_SoPW.pl
Histogram 1:  { a => 2, b => 5, c => 7 }
Histogram 2:  { b => 3, c => 1, d => 4 }
Union:        { a => 2, b => 8, c => 8, d => 4 }
Intersection: { b => 3, c => 1 }
XOR:          { a => 2, d => 4 }

14:17 >
[download]

However, it is doubtful that this approach will scale to accommodate hashes containing gigabytes of data. For that scenario, you should probably be looking to use a database.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Comment on Re^3: Contextual/categorical Histogram Select or Download Code