WartHog369 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings All,

This is my first posting/question. I am an ancient computer guy (punch cards and hand-wiring sort boards!) but relatively new to Perl.

Why does the line of code (65 or so lines below):
>>>>> foreach $hr (sort keys %{$ClubTotal ... <<<<<
not work? It does not sort, it does not appear to access the 'bottom key', it does not return anything to $hr (I understand/realize that a list should be returned from 'sort keys').

Can I - using 'only' my code access/retrieve all the data from %ClubTotal? Or, must I use Data::Walk or Data::Visitor or something similiar?

Many, many thanks for sharing your knowledge.

Thomas

* * * Begin program:
* * * Do some preprocessing of input data file, then process * * *
* * * 50,000+ lines of data that we are interested in: * * *
* * * Time of day by halfhour when a patron signs-in at club * * *

* * * This code builds the HASH: %ClubTotal * * *

$dataChunk = substr($_, 42, 17); # yields >06/05/2008 8:31a< from $_ if ($dataChunk =~ m|\d\d/\d\d/\d\d|) { $date = substr($dataChunk, 3, 2); # get date $hour = substr($dataChunk, 11, 2); # get hour $minute = substr($dataChunk, 14, 2); # get minutes $ampm = substr($dataChunk, 16, 1); # get am or pm if (($ampm eq "a") && ($hour <= 9)) {$hour = '0'.int($hour) ; } if (($ampm eq "p") && ($hour != 12)) {$hour = $hour + 12; } $ClubTotal{ 'DayOfMonth'=> $date }{'Date'} = $date ; $ClubTotal{ 'DayOfMonth'=> $date }{'TotalPerDay'} += 1 ; $ClubTotal{ 'DayOfMonth'=> $date }{ 'HourSignIn' => $hour }{'Hour'} += $hour ; $ClubTotal{ 'DayOfMonth'=> $date }{ 'HourSignIn' => $hour }{'TotPerH +our'} += 1 ; if ($minute <= 29) { $ClubTotal{ 'DayOfMonth'=> $date }->{ 'HourSignIn' => $hour }->{'H +alfHour00'} += 1 ; } else { $ClubTotal{ 'DayOfMonth'=> $date }->{ 'HourSignIn' => $hour }->{'H +alfHour30'} += 1 ; } #* * * Do some extra stuff here, finished with input file * * * #* * * Call: sub SortBuildOutput{ * * * sub SortBuildOutput{ #%ClubTotal hash #DayOfMonthxx hash #HourSignInxx hash #TotPerHour scalar #HalfHour00 scalar #HalfHour30 scalar #Hour scalar my $day; my $hr; my @hr; foreach $day (sort keys %ClubTotal) { >>>>> foreach $hr (sort keys %{$ClubTotal{'$day'}{'HourSignIn'} } ) { + <<<<< print $ClubTotal{$day}->{'Date'} . ": " . $ClubTotal{$day}->{'To +talPerDay'} . "\n"; # This TASK: Aggregate Data for end of month summary: # %ClubTotal = ( # 'DayOfMonth26' => { # 'HourSignIn13' => { # 'HalfHour30' => 7, # 'Hour' => 13, # 'HalfHour00' => 11, # 'TotPerHour' => 18 # } # } # to build a record for input to a spreadsheet } # END: foreach my $hr (sort keys %{$ClubTotal{'$day'}{HourSignI +n}}) { } # END: foreach my $day (sort keys %ClubTotal) { } #END: sub SortBuildOutput{
Data::Dumper produces:
%ClubTotal = ( 'DayOfMonth26' => { 'HourSignIn13' => { 'HalfHour30' => 13, 'Hour' => 13, 'HalfHour00' => 10, 'TotPerHour' => 23 }, 'HourSignIn12' => { 'HalfHour30' => 6, 'Hour' => 12, 'HalfHour00' => 8, 'TotPerHour' => 14 }, 'HourSignIn10' => { 'HalfHour30' => 20, 'Hour' => 10, 'HalfHour00' => 19, 'TotPerHour' => 39 'TotalPerDay' => 251, 'Date' => '26', 'DayOfMonth11' => { 'HourSignIn13' => { 'HalfHour30' => 7, 'Hour' => 13, 'HalfHour00' => 11, 'TotPerHour' => 18 }, 'HourSignIn10' => { 'HalfHour30' => 12, 'Hour' => 10, 'HalfHour00' => 11, 'TotPerHour' => 23 },

and so on for all 30 days of the month to produce cumulative totals of all data/scalar items.

Replies are listed 'Best First'.
Re: Brain muchly befuddled by nested hashes
by ysth (Canon) on Nov 24, 2008 at 04:57 UTC
    '$day'
    Variables don't interpolate in single quotes. That should be "$day", or better, just $day.
    $ClubTotal{ 'DayOfMonth'=> $date }
    What are the keys of the outermost hash supposed to be? That may not do what you intend. When you provide a list when referencing a hash element, perl assumes you are using old perl4-style emulated nested hashes and produces a single hash key like this: $ClubTotal{"DayOfMonth$;$date"}. See $;.
    Data::Dumper produces:
    Sadly, Data::Dumper doesn't have the best defaults, so it is producing literal \034 characters in the output there. Always setting $Data::Dumper::Useqq=1 when examining your data can be helpful.
      When you provide a list when referencing a hash element, perl assumes you are using old perl4-style emulated nested hashes and produces a single hash key like this: $ClubTotal{"DayOfMonth$;$date"}.

      Based on looking at the OP's output from Data::Dumper (and on trying it myself), it seems that perl is not adding any sort of field delimiter in the hash key:

      perl -le '$n=0; $h{"foo"=>$n++}=$n for (0..3); print "$_ => $h{$_}" fo +r (sort keys %h)' foo0 => 1 foo1 => 2 foo2 => 3 foo3 => 4
      That said, I would agree that presence of the "fat comma" (=>) as part of the hash-key expression looks like a misunderstanding (and/or could be misunderstood by less skilled readers), and is rather ugly as well. Something like simple concatenation or string interpolation would be clearer ($hash{'string'.$num} or $hash{"string$num"}).
        The delimiter is there, you just don't see it. Hence my comments about Data::Dumper's defaults (added shortly after I first posted, so you may have missed them).
        $ perl -le '$n=0; $h{"foo"=>$n++}=$n for (0..3); use Data::Dumper; pri +nt Dumper \%h; $Data::Dumper::Useqq=1; print Dumper \%h' $VAR1 = { 'foo1' => 2, 'foo0' => 1, 'foo2' => 3, 'foo3' => 4 }; $VAR1 = { "foo\0341" => 2, "foo\0340" => 1, "foo\0342" => 3, "foo\0343" => 4 };
Re: Brain muchly befuddled by nested hashes
by ptoulis (Scribe) on Nov 24, 2008 at 08:28 UTC
    Your date format is not suitable for sorting. It should be in YYYY/MM/DD format and not DD/MM/YYYY. From your code a date 10/10/2008 comes first compared to 09/09/2000. In addition, the sort function is not by default a numeric sort, which is what you want. If you try to sort the array (2,10) then you will get (10,2). You should reconsider also, the way you build up the hash: the => seems irrelevant in the outer hash.

      Thank you - ptoulis (Sexton) from Nov 24, 2008 at 08:28 UTC - for your quick reply to my inquiry and your most helpful suggestions.

      I recall now, I have previously read about sorting dates in century/mm/dd order instead of the traditional American date structure of dd/mm/yy.

      However, if you will please note: After I parse the 'incoming' date structure into its' individual components - I do not deal with either format - in any manner - in the program. You will further note that in the line:
      if (($ampm eq "a") && ($hour <= 9)) {$hour = '0'.int($hour) ; }
      that I prepend a zero to the single digit hour to assist in sorting.

      Could you, please, expand on your comment: "You should reconsider also, the way you build up the hash: the => seems irrelevant in the outer hash."
      I have tried to mimic the example from the manpage: "perldsc", which shows:

      %HoH = ( flintstones => { lead => "fred", pal => "barney", }, jetsons => { lead => "george", wife => "jane", "his boy" => "elroy", }, simpsons => { lead => "homer", wife => "marge", kid => "bart", }, );

      What am I missing (doing wrong)?
      Again, thank you for your assistance, sometimes it's a chore trying to teach an old dog new tricks.
      Thomas

        Even if you prepend the '0' before values, it still isn't enough. For example 2000<2008 but 10/2000>08/2008, so it is right to check big things first.

        About the hashes, my point was that the usual case is that you define a hash as %hash = (key1=>value1, key2=>value2...); and it is not frequent (if plausible) to use the => symbol inside the key definition. You say $ClubTotal{'DayOfMonth'=>$date}{..} =... which puts the '=>' in the key. Now, the hash concept is that a hash is just a table organized by distinct words. As such, there is no point in prefixing the 'DayOfMonth' (a constant string before the date) and the '=>' is really misleading because one expects a value after it. It would be much more readable to write something like this:
        %ClubTotal = { $date => { TotalPerDay=>1, HourSignIn=> { Value=>$hour, ToPerlHour=>$perlHour } } }

        This is just an example. Things should be as simple as possible. For example, in your code you say: $ClubTotal{ 'DayOfMonth'=> $date }{'Date'} = $date ;, which means that your key and value are the same variable! There are 3 date's in this code which is a waste of time (and space). Simply stating $ClubTotal{$date}=... is enough since the date you want is the key of the hash.
Re: Brain muchly befuddled by nested hashes
by Cristoforo (Curate) on Nov 25, 2008 at 00:19 UTC
    Not sure if this will get you any closer, but here is my take. I changed the way values are gotten from using 'substr' to captures from a regular expression (if that would work).

    I formatted $hour to have a leading '0', if necessary, using the sprintf function.

    There are 2 hashes: one for accumulating the daily totals and one for accumulating the hourly stats.

    my %daily_total; my %hourly_stats; while (<DATA>) { # looking for: 06/05/2008 8:31a if (m{\d\d/(\d\d)/\d{4}\s{1,2}(\d+):(\d\d)(a|p)}) { my $day = $1; # get day my $hour = sprintf "%02d", $2; # get hour my $minute = $3; # get minutes my $ampm = $4; # get am or pm if ($ampm eq "p" && $hour != 12) {$hour += 12; } $hour = '00' if $ampm eq "a" && $hour == 12; $daily_total{$day}++; $hourly_stats{$day}{$hour}{total}++; if ($minute <= 29) { $hourly_stats{$day}{$hour}{HalfHour00}++; } else { $hourly_stats{$day}{$hour}{HalfHour30}++; } } } for my $day (sort keys %daily_total) { print "day: $day, total: $daily_total{$day}\n"; for my $hour (sort keys %{ $hourly_stats{$day} }) { print "hour: $hour\n"; print " first 1/2 hr: ", $hourly_stats{$day}{$hour}{HalfHour +00}|| 0,"\n"; print " second 1/2 hr: ",$hourly_stats{$day}{$hour}{HalfHour +30}|| 0,"\n"; print " total/hour: $hourly_stats{$day}{$hour}{total}\n"; } print "\n"; }
    Chris

    Update: A change to output routine to correct error found using dataset provided by johngg.

Re: Brain muchly befuddled by nested hashes
by johngg (Canon) on Nov 25, 2008 at 23:55 UTC

    I think there has been some confusion caused by one of the variable names you have chosen. You use substr to pull a date and time string from your data line. You then use substr again to isolate parts of that string. One of these you call $date but it is, in fact, only the day of the month; perhaps something like $mday would have been a more descriptive choice.

    Others have addressed the issue of the fat comma (=>) in your hash key and the resultant odd characters in your keys. I concatenate the text you use with the day or hour value to form the keys by using interpolation in a double-quoted string. Cristoforo thought at a regular expression would be a good choice for extracting the date and time information you require. I would agree but, without seeing your data, I do not know whether there is similarly formatted date & time information earlier in each line. Therefore, I use a combination of substr and a regular expression, using captures in the match to pull out fields of interest (see Extracting matches, perlretut and perlre). Cristoforo also advocated the use of sprintf to format your hours along with a solution using two hashes rather than your one. My code below uses just the one hash and, as ptoulis recommends, avoids some of the redundancy in your data structure. I also avoid some typing in each nested foreach by assigning the hash reference I'm interested in within the structure to a lexical scalar scoped to the loop, see perlreftut and perlref. When incrementing the hourly count for the bottom or top of the hour I use a ternary to decide which hash element to address, see Conditional Operator in perlop.

    use strict; use warnings; my %signIn = (); while( <DATA> ) { my $dateStr = substr $_, 42, 17; next unless my( $mday, $hr, $min, $ampm ) = $dateStr =~ m{\d\d/(\d\d)/\d{4}\s+(\d{1,2}):(\d\d)(a|p)}; $hr = 0 if $hr == 12; $hr += 12 if $ampm eq q{p}; $hr = sprintf q{%02d}, $hr; my $mdayKey = qq{DayOfMonth$mday}; my $hrKey = qq{HourSignIn$hr}; $signIn{ $mdayKey }->{ TotalPerDay } ++; $signIn{ $mdayKey }->{ $hrKey }->{ TotalPerHour } ++; $signIn{ $mdayKey }->{ $hrKey }-> { $min < 30 ? q{HalfHour00} : q{HalfHour30} } ++; } foreach my $mdayKey ( sort keys %signIn ) { my $rhDaily = $signIn{ $mdayKey }; print qq{$mdayKey\n}, qq{ TotalPerDay - $rhDaily->{ TotalPerDay }\n}; foreach my $hrKey ( sort grep m{^Hour}, keys %{ $rhDaily } ) { my $rhHrly = $rhDaily->{ $hrKey }; print qq{ $hrKey\n}, qq{ TotalPerHour - $rhHrly->{ TotalPerHour }\n}, qq{ HalfHour00 - }, exists $rhHrly->{ HalfHour00 } ? qq{$rhHrly->{ HalfHour00 }\n} : qq{0\n}, qq{ HalfHour30 - }, exists $rhHrly->{ HalfHour30 } ? qq{$rhHrly->{ HalfHour30 }\n} : qq{0\n}; } } __END__ 0123456789012345678901234567890123456789 06/05/2008 8:31a&&&&&&& abcdefghijabcdefghijabcdefghijabcdefghij 06/07/2008 12:15p&&&&&&& 0123456789012345678901234567890123456789 06/05/2007 1:46p&&&&&&& abcdefghijabcdefghijabcdefghijabcdefghij 06/05/2008 12:49p&&&&&&& This is a line of rubbish that doesn't match the data requirements 0123456789012345678901234567890123456789 06/05/2008 2:24a&&&&&&& abcdefghijabcdefghijabcdefghijabcdefghij 06/05/2007 11:09a&&&&&&& 0123456789012345678901234567890123456789 06/12/2007 12:17a&&&&&&& abcdefghijabcdefghijabcdefghijabcdefghij 06/05/2008 11:09p&&&&&&& 0123456789012345678901234567890123456789 06/05/2008 11:42p&&&&&&&

    The output.

    DayOfMonth05 TotalPerDay - 7 HourSignIn02 TotalPerHour - 1 HalfHour00 - 1 HalfHour30 - 0 HourSignIn08 TotalPerHour - 1 HalfHour00 - 0 HalfHour30 - 1 HourSignIn11 TotalPerHour - 1 HalfHour00 - 1 HalfHour30 - 0 HourSignIn12 TotalPerHour - 1 HalfHour00 - 0 HalfHour30 - 1 HourSignIn13 TotalPerHour - 1 HalfHour00 - 0 HalfHour30 - 1 HourSignIn23 TotalPerHour - 2 HalfHour00 - 1 HalfHour30 - 1 DayOfMonth07 TotalPerDay - 1 HourSignIn12 TotalPerHour - 1 HalfHour00 - 1 HalfHour30 - 0 DayOfMonth12 TotalPerDay - 1 HourSignIn00 TotalPerHour - 1 HalfHour00 - 1 HalfHour30 - 0

    I hope you find these ideas useful.

    Cheers,

    JohnGG

      
      To All who have contributed:
      *THE* 'aah-hhaa' moment is gathering steam in the hinterlands of my brain
      and I expect to encounter it soon.
      
      I believe my basic problem (area of misunderstanding) is that I am conflating
      Perl's keys and values for hashes with how I would work with (program/manipulate)
      datafields/sets/databases in another environment.  Perl's hashes, I perceive, are
      a slightly different animal than what I have used in the past.
      
      For now though, THANK YOU, for:
      ysth:   Variables don't interpolate in single quotes.
              setting $Data::Dumper::Useqq=1 when examining your data
      
      graff:   Something like simple concatenation or string interpolation would be clearer
              ($hash{'string'.$num} or $hash{"string$num"}). (ME: You share in the comment below
              to -johngg- with regard to 'interpolation')
      
      ptoulis: noted before, {[ME: but still having trouble with the 'fat comma' and its' documentation)
      
      Cristoforo: "but here is my take...there are 2 hashes" - (ME: I was tring to be 'simple' and
                   stay with only one hash.  Also, regEx's are still a bane to me (apologies to Friedl
                   and Goyvaerts))
      
      wol:    avoid using a => in a hash key, Welcome to Perl :-) (ME: you have a wicked sense of humor - no?)
      
      johngg: I think there has been some confusion caused by one of the variable names you have chosen. (ME: Concur)
              the issue of the fat comma (=>) in your hash key (ME: Yes, I was tring to define the hash key
              "on the fly" with concatenation and should have been using "interpolation" as you have suggested)
              see perlreftut and perlref.(ME: references I dig (60's lingo))
              (ME: "The Output" that you present appears to be exactly what I was trying to accomplish!)
      
      For ALL: my sincerest thanks for your time and effort on my part, and now that I not as "new to Perl" as
      I 'once' was, perhaps someone may present a thorny problem for which I may be of some assistance.  Thank You.
      Thomas
      
Re: Brain muchly befuddled by nested hashes
by wol (Hermit) on Nov 25, 2008 at 17:48 UTC
    One issue I can see is that in your problem foreach statement, you use a constant 'HourSignIn' to access your nested hash structure. However, Data::Dumper is indicating that there's no such hash key, but there are hash keys such as 'HourSignIn13', 'HourSignIn12', 'HourSignIn10', etc.

    Hey - what's going on - there are weird characters in there between the 'In' and the digits! What's with the funny stuff?!

    Ah - that's what the post about character \034 (0x1C) was on about... I think I've just caught up with the conversation here!

    So, to re-address the problems you're seeing, avoid using a => in a hash key, ie

    $ClubTotal{ 'DayOfMonth' => $date }...
    because Perl does give that some meaning, but it's almost certainly not the meaning you want! Instead I think you need:
    $ClubTotal{'DayOfMonth'}{$date}...
    This will allow you to iterate over the items in the nested hash:
    foreach my $date (%$ClubTotal{'DayOfMonth'}) { ... }
    Hope that points you in a useful direction.

    Welcome to Perl :-)

    --
    .sig : File not found.