dmtelf has asked for the wisdom of the Perl Monks concerning the following question:

Given a hash like this:

$file{$pos}{TITLE} = $line; $file{$pos}{SIZE} = $mrounded; $file{$pos}{VLENGTH} = $vlength; # $vlength = "$d days $h hours $m mins $s secs";
How can I loop through the hash and get a summary of file lengths like this:
There are n clips between 0-1 mins : There are n clips between 1-2 mins : There are n clips between 2-3 mins : There are n clips between 3-4 mins : There are n clips between 4-5 mins : There are n clips more than 5 mins : Names of clips between 0-1 mins: ...... Names of clips between 1-2 mins: ......
The name of the clip would be stored in $file{$pos}{TITLE}

I've done this but using a really ugly & *huge* if-else loop (ugh) and think this summarisation could be done in a few lines of code.

Any & all suggestions gratefully received!

dmtelf

Replies are listed 'Best First'.
RE: Numeric summarisation from data in a hash?
by Corion (Patriarch) on Jul 07, 2000 at 16:24 UTC

    My idea of solving this is putting the intervals into an array and then selecting the "right" bucket for the length. I assume that you can somehow get at the playlength of a file in seconds. (untested code follows):

    my @playlengths = ( 0, 1*60, 2*60, 3*60, 4*60, 5*60, # this is just to show 10*60, # that we can have gaps in the array 60*60, # and anything longer than one hour # goes into this bucket ... 2147483648, # I use this as an easy end-of-array marker ); # First we retrieve the playlength in seconds # from the current file for later reference my $playlength = &getPlaylength( $file{$pos}); # $slot will hold the position of our file # so @playlengths[$slot] <= $playlength < $playlengths[$slot+1] # always holds my $slot = 0; while ($playlength < @playlengths[$slot]) { $slot++; }; if ($slot == $#playlengths) { print "The file playlength is longer than $playlengths[$slot] second +s.\n"; } elsif ($slot == 0) { print "The file playlength is below ",$playlengths[1]," seconds.\n"; } else { print "The file playlength is between $playlengths[$slot] and $playle +ngths[$slot+1] seconds." };
Re: Numeric summarisation from data in a hash?
by ahunter (Monk) on Jul 07, 2000 at 17:58 UTC
    Convert $vlength to seconds (easy), and use grep in its scalar context:
    sub btween (\%$$) { my ($hash, $from, $until) = @_; return scalar(grep { $hash->{$_}->{VLENGTH} >= $from && $hash->{$_}->{VLENGTH} < $until } keys(%{$hash})); } print "There are ", btween(%hash, 0, 60), " clips between 0-1 minutes\ +n";
    (And so on and so forth)

    Update: Oops, forgot the titles. Grep again:

    sub btween (\%$$) { my ($hash, $from, $until) = @_; my @found = grep { $hash->{$_}->{VLENGTH} >= $from && $hash->{$_}->{VLENGTH} < $until } keys(%{$hash})); my $titles = join(', ', map { $hash->{$_}->{TITLE} } @found); return ($#found, $titles); } my @result = btween(%hash, 0, 60); print "There are $result[0] clips between 0-1 minutes, and their names + are: $result[1]\n";
    Update the second: Rewrote the update in the form of the original sub statement. This doesn't work for 'there are n clips >5 minutes long'-type things. Change the conditional in the grep statement to fix this.

    Andrew.

Re: Numeric summarisation from data in a hash?
by davorg (Chancellor) on Jul 07, 2000 at 16:32 UTC

    Suggestion one would be to put the data in an RDMS and extract the data you want using SQL.

    Suggestion two would be to use the DBD::RAM module to fake an RDBMS.

    Suggestion three would be to use Colin's suggestion. Here's some (untested) code that implements a possible solution.

    my @clips; foreach (keys %file) { push @{$clips[int($file{$_}{VLENGTH})]}, $file{$_}{TITLE}; } foreach (0 .. $#clips) { print "There are ", scalar @{$clips[$_]}, " clips between $_ -", $_ + 1, " mins\n"; } foreach (0 .. $#clips) { print "Names of clips between $_ - ", $_ + 1, " mins: ", @{$clips[$_]}, "\n"; }
    --
    <http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000, ICA, London
    <http://www.yapc.org/Europe/>
Re: Numeric summarisation from data in a hash?
by splinky (Hermit) on Jul 07, 2000 at 16:26 UTC
    Untested, but you get the idea:

    foreach my $key (keys %file) { push @{$clips[5]}, $file{$key}{TITLE} if $file{$key}{VLENGTH} !~ /0 days 0 hours (\d+) mins/ || $1 >= 5; push @{$clips[$1]}, $file{$key}{TITLE}; } my $names = ''; foreach my $i (0..$#clips-1) { print 'There are ', scalar(@{$clips[$i]}), ' clips between ', $i, '-', $i+1, " mins :\n"; $names .= 'Names of clips between ', $i, '-', $i+1, ' mins: ', join(', ', @{$clips[$i]}), "\n"; } print 'There are ', scalar(@{$clips[5]}), " clips more than 5 mins :\n +\n"; print $names; print 'Names of clips more than 5 mins: ', join(', ', @{$clips[5]}), " +\n";

    *Woof*

Re: Numeric summarisation from data in a hash?
by lhoward (Vicar) on Jul 07, 2000 at 16:36 UTC
    The following code (or some derivation of it) should be what you want. Basically it goes through the hash once and builds an array-of-arrays to store the indexes of the files, broken down by duration. Then just iterates through that array to print out the results. :
    #!/usr/bin/perl -w use strict; # maximal $ of minutes my $max_bucket=5; my %file; $file{1}{TITLE}='foo'; $file{1}{SIZE}='5'; $file{1}{VLENGTH}='0 days 0 hours 4 mins 5 secs'; $file{2}{TITLE}='bar'; $file{2}{SIZE}='92'; $file{2}{VLENGTH}='0 days 0 hours 4 mins 13 secs'; $file{3}{TITLE}='baz'; $file{3}{SIZE}='1'; $file{3}{VLENGTH}='0 days 0 hours 2 mins 23 secs'; # initialize stats array my @stat; for(my $c=0;$c<=$max_bucket;$c++){ $stat[$c]=[]; } # populate stats array my $k; foreach $k(keys %file){ my ($d,$h,$m)=$file{$k}{VLENGTH}=~/(\d+)\s*days\s*(\d+)\s*hours\s*(\ +d+)\s*mins/i; my $duration=$m+$h*60+$d*60*24; $duration=$max_bucket if($duration>$max_bucket); push @{$stat[$duration]},$k; } # iterate through stats array to print out counts for (my $c=0;$c<=$max_bucket;$c++){ print "there are ".(scalar @{$stat[$c]})." clips "; if($c==$max_bucket){ print "greater than $c mins :\n"; }else{ print "between $c and ".($c+1)." mins :\n"; } } # iterate through stats array to print out names for (my $c=0;$c<=$max_bucket;$c++){ print "Names of clips "; if($c==$max_bucket){ print "greater than $c mins : "; }else{ print "between $c and ".($c+1)." mins : "; } print "".(join ', ',map {$file{$_}{TITLE}} @{$stat[$c]})."\n"; }
RE: Numeric summarisation from data in a hash?
by cds (Sexton) on Jul 07, 2000 at 16:14 UTC

    This is a suggestion. It is only a suggestion. It may not work at all...

    Try rounding the length up to an integer. Have a single check for the bigger than x case. Use an array of arrays for the others, with the number as the offset of the outer array. Push all the filenames into the inner arrays. Then at the end you just have to grab the length of these arrays and iterate them all out. You can put the bigger than cases into the oute array at the next offset after the higest value you want.

    I hope that makes sense. I'm not entirely sure what terms to use when referring to arrays of arrays.

    Colin Scott
    If you build it, they will be dumb...
      I'm not entirely sure what terms to use when referring to arrays of arrays.

      The proper term is "array of arrays":-) Not kidding. See also "arrays of hashes", "hash of arrays", and "hash of hashes".

      *Woof*

        Seems I was insufficently clear there. What I meant was the terms used to refer to what I was calling the "inner" and "outer" arrays.

        Colin Scott
        If you build it, they will be dumb...