Re: Contextual/categorical Histogram

Hello f77coder,

It seems to me that the code you’ve shown is already doing most of what you need. I’ve tweaked it a bit and added some (admittedly naïve¹) code to generate the histogram:

#! perl
use strict;
use warnings;
use Data::Dump;
use List::Util 'max';

# 1. Configuration

use constant UNIT => '* ';
my  @required_keys = qw(5 a foo);

# 2. Read in and count the data

my %counts = map { $_ => 0 } @required_keys;

while (<DATA>)
{
    ++$counts{$_} for split;
}

dd \%counts;    # Verify hash contents

# 3. Generate the histogram

print "\nHistogram:\n";
my $max_len = max map { length } keys %counts;

for (sort keys %counts)
{
    printf "%*s: ", $max_len, $_;
    print  UNIT for 1 .. $counts{$_};
    print  "\n";
}

__DATA__
a -2 3 b 0xffff c 2 b a 4 a a 200
0xffff 17 a a c 3 200 201 b -2 b
a b c a a 2 c -2
[download]

Output:

14:14 >perl 958_SoPW.pl
{
  "-2"     => 3,
  "0xffff" => 2,
  "17"     => 1,
  "2"      => 2,
  "200"    => 2,
  "201"    => 1,
  "3"      => 2,
  "4"      => 1,
  "5"      => 0,
  "a"      => 9,
  "b"      => 5,
  "c"      => 4,
  "foo"    => 0,
}

Histogram:
    -2: * * *
0xffff: * *
    17: *
     2: * *
   200: * *
   201: *
     3: * *
     4: *
     5:
     a: * * * * * * * * *
     b: * * * * *
     c: * * * *
   foo:

14:14 >
[download]

Note: If keys are known in advance, they can be added to @required_keys; this allows zero-frequency keys to appear in the histogram.

OK, I’m fairly sure this isn’t what you wanted, but perhaps by explaining where it falls short you can clarify what you mean by a “contextual/categorical” histogram.

Anyway, hope it helps,

¹Because it doesn’t attempt to scale the output when the frequencies become too large.

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Comment on Re: Contextual/categorical Histogram Select or Download Code

Replies are listed 'Best First'.

Re^2: Contextual/categorical Histogram
by f77coder (Beadle) on Aug 03, 2014 at 18:35 UTC

Many thanks for the help. I'm looking to do a rolling contextual histogram as data arrives. This is like a poor man's data classifier.

For comparing 2 histograms, are there fast (there are Gb of lines) methods for doing intersect? xor? join?

[reply]

Re^3: Contextual/categorical Histogram

by Athanasius (Cardinal) on Aug 04, 2014 at 04:22 UTC

Performing an intersection, xor (symmetric difference), or join (union) operation on two histograms is fairly straightforward: