Sosi has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks! I would appreciate your enlightenment on the following. I have a tab-separated file as follows

A B n1 A B n2 A C n1 D E n2 D E n4 D F n1

and I want to count how many elements on the 3rd column does each element on the two other columns have. I.e. There are A=>3, A=>B=>2, A=>C=>1, etc. It is important for me to have the association between the first and second columns.

The easy way to solve this is just counting how many times A repeats, and how many times B repeats, etc. etc. But because I'm learning Perl, I'd like to know how to use a HoH to do this. In this regard, I built a HoH of the form

{ A => { B => [ n1, n2, ], C => [ n1, ] }, D => { E => [ n2, n4, ], F => [ n1, ] } }

I know how to do this for counting the number of values in the third level, and the number of values in the second level, but how can I multiply them? Ideally, what I'd like to do is to create a key "count" with the number of "n" in the third column for each. For instance

{ A => { count => 3, B => { count => 2, [ n1, n2, ], }, C => { count => 1, [ n1, ] } },etc.

Any tips on how to get this kind of value counting?

Replies are listed 'Best First'.
Re: Count number of elements in HoH and sum them for other keys
by smls (Friar) on Jun 03, 2014 at 11:54 UTC

    Your specification for the desired result hash is not quite valid: You cannot have a lone array reference in a hash, only  key => value pairs. This can be fixed by giving the array references a key, for example "values". The result hash would then look like this:

    ( A => { count => 3, B => { count => 2, values => ["n1", "n2"] }, C => { count => 1, values => ["n1" ] } }, D => { count => 3, E => { count => 2, values => ["n2", "n4"] }, F => { count => 1, values => ["n1" ] } } )

    It can easily be generated....

    ...from the HoH that you already have:

    You can transform the existing HoH into the specified result hash, using two (nested) loops, and making use of the fact that  scalar @array gives the number of elements in an array:

    my %hoh = ( A => { B => [ "n1", "n2" ], C => [ "n1" ] }, D => { E => [ "n2", "n4" ], F => [ "n1" ] }, ); foreach my $col1 (keys %hoh) { my $count1 = 0; foreach my $col2 (keys %{$hoh{$col1}}) { my $count2 = scalar @{$hoh{$col1}{$col2}}; $hoh{$col1}{$col2} = { count => $count2, values => $hoh{$col1}{$col2} }; $count1 += $count2; } $hoh{$col1}{count} = $count1; }

    ...from the original data:

    If you do the counting directly in the code that generates the HoH in the first place, it's even easier - just increment the counters for both levels as you go along:

    my %hoh; while (<DATA>) { chomp; my ($c1, $c2, $c3) = split; $hoh{$c1}{count}++; $hoh{$c1}{$c2}{count}++; push @{$hoh{$c1}{$c2}{values}}, $c3; } __DATA__ A B n1 A B n2 A C n1 D E n2 D E n4 D F n1

    ---
    Edit: Refactored the answer to make it more structured.

      thank you so much. I was a bit confused at the beginning: I thought that you had to specify a starting value for $hoh{$c1}{$c2}{count}. By the way, and I know I'm going a bit astray of the initial question, but what if I wanted to start that count at 5? Would specifying

      $hoh{$c1}{count}=5;

      work if I specified it before incrementing in your while loop? Thanks!

        I thought that you had to specify a starting value for $hoh{$c1}{$c2}{count}.

        When you dereference or modify a non-existing array or hash element, it will automatically "spring to life", including all the necessary intermediate hashes/arrays. For example:

        my %test; $test{a}[2]{b} = 'Hello'; # %test now contains: # ( a => [ undef, # undef, # { b => "Hello" } ] )

        It's called autovivification, and it's one of the nice features that make Perl special... :)   See Wikipedia and perlreftut for more info.

        In addition, the ++ (auto-increment) operator silently treats undef as 0. So you don't need to specify an initial value.


        what if I wanted to start that count at 5?

        One solution would be to create the hash first, and then use another loop to add 5 to each counter.

        Alternatively, you can do a check inside the loop (before incrementing!) to see if the counter has already been incremented previously, and if not, initialize it with the number 5:

        if (!$hoh{$c1}{count}) { $hoh{$c1}{count} = 5; } # verbose form
        $hoh{$c1}{count} ||= 5; # shortcut

        (See C style Logical Or and Assignment Operators.)

Re: Count number of elements in HoH and sum them for other keys
by BillKSmith (Monsignor) on Jun 03, 2014 at 12:33 UTC
    A simple hash is all you need. Use the things you want to count as keys.
    use strict; use warnings; my %hash; while (<DATA>) { my @elements = split; $hash{join ' => ', @elements[0,1]}++; $hash{$elements[0]}++; } foreach my $key (sort keys %hash) { printf "%-6s => %d\n", $key, $hash{$key}; } __DATA__ A B n1 A B n2 A C n1 D E n2 D E n4 D F n1
    OUTPUT:
    A => 3 A => B => 2 A => C => 1 D => 3 D => E => 2 D => F => 1
    Bill
Re: Count number of elements in HoH and sum them for other keys
by kcott (Archbishop) on Jun 03, 2014 at 13:54 UTC

    G'day Sosi,

    Parsing tab-, comma-, whatever-separated files has various issues that have already been dealt with by Text::CSV [see also: Text::CSV_XS and Text::CSV_PP]. This is probably not a wheel you need to reinvent: I've shown usage of Text::CSV in the example code (below).

    Your output data structure is flawed. You have instances of this general code:

    X => { count => n, [ ... ] }

    You have three elements in the hashref, which is a problem: key/values pairs result in an even number of elements. That generates an "Odd number of elements in anonymous hash" warning.

    In the example code (below), I've added a "values" key for both the arrayref (i.e. the third element) and the top-level key.

    X => { count => n, values => [ ... ] }

    Here's the example code:

    #!/usr/bin/env perl -l use strict; use warnings; use autodie; use Text::CSV; my $csv = Text::CSV::->new({sep_char => "\t"}); my %data; while (my $row = $csv->getline(\*DATA)) { ++$data{$row->[0]}{count}; ++$data{$row->[0]}{values}{$row->[1]}{count}; push @{$data{$row->[0]}{values}{$row->[1]}{values}}, $row->[2]; } use Data::Dump; dd \%data; __DATA__ A B n1 A B n2 A C n1 D E n2 D E n4 D F n1

    Which outputs:

    { A => { count => 3, values => { B => { count => 2, values => ["n1", "n2"] }, C => { count => 1, values => ["n1"] }, }, }, D => { count => 3, values => { E => { count => 2, values => ["n2", "n4"] }, F => { count => 1, values => ["n1"] }, }, }, }

    -- Ken

      > Parsing tab-, comma-, whatever-separated files has various issues that have already been dealt with by Text::CSV

      Like what?

      That module deals with all the features and intricacies of the 'official' CSV format, such as quoting/escaping and embedding newlines/NULLbytes. But the OP never specified the input data be in that complex format; from the looks of it it's just simple ASCII strings delimited by tabs and newlines. No need to take a sledgehammer to crack a nut...

      > This is probably not a wheel you need to reinvent

      Applying a simple, tried-and-true Perl idiom is not much of an 'invention':

      while(<>) { my @fields = split /\t/; ... }
Re: Count number of elements in HoH and sum them for other keys
by Anonymous Monk on Jun 03, 2014 at 11:49 UTC

    Seems like an exercise in linked list data structure that accounts for multiple occurrences of a value.

    For your ideal hash reference structure when you are populating the hash reference, (1) stick a "count" key, value of which is incremented (& starts with 1) just in the manner you would add "B" key of a hash reference value for "A" key; (2) assign the array reference value also a key to have a valid hash reference.

      And to multiply, iterate over the keys, via keys function, of each hash reference to find the count; store them; then multiply. See also "perlref" & "perlreftut" PODs.

        thanks your comments were quite helpful. I read them before seeing the reply of smls below, and was implementing something similar to what he ended up doing. Thanks