in reply to Re^2: Extracting and counting string occurences
in thread Extracting and counting string occurences

Well, I recommend doing some reading ! His/Her grace, ikegami (Archbishop), used a hash (%flushes) to collect the results. Hashes are a wonderful thing, and worth getting to grips with.

However, his/her grace did use a "two dimensional" hash, which is rather throwing you into the murkier part of the deep end.

So, let's simplify this, a bit. Your first requirement was to count the number of whatever they were, on a monthly basis. The regular expression extracted the month name and year from the date into the string $month, for example 'Aug-2008'. You could use the hash %total_counts to count the totals for each month, thus: $total_counts{$month}++.

To extract the totals for each month you could:

while (my ($month, $total) = each(%total_counts)) { print "$month: $total\m" ; } ;
where each(%total_counts) walks the hash and returns key and value pairs -- but in some apparently random order. You could:
foreach my $month (sort keys %total_counts) { print "$month: $total_counts{$month}\n" ; } ;
where keys %total_counts returns a list of all of the keys in %total_counts (in some apparently random order), which is then sorted alphabetically -- so the output is the counts so sorted. You probably want:
foreach my $month (sort_months(keys %total_counts)) { ....
where the subroutine sort_months() is left as an exercise -- taking in a list of month strings and returning a list in the required order.

Your second requirement was to count for each user, on a monthly basis. So you want a hash, say %user_counts, with an entry for each user: $user_counts{$user}. Each entry needs to be a separate count for each month... now things are getting tricky. A hash entry is a single scalar value; you cannot have hash entries which are arrays or hashes. However, you can have scalars which are references to arrays or hashes. So, where the entries in %total_counts are simple (numeric) scalars, the entries in %user_counts are references to hashes, each hash being similar to %total_counts -- that is, the key value is a month string (eg 'Aug-2008') and the value is the count for that month. The short hand in Perl to refer to a count value in this structure is $user_counts{$user}{$month} -- meaning: (a) get entry in %user_counts whose key is $user, (b) that entry refers to a hash, get the entry in that hash whose key is $month.

To extract the per user data is:

foreach my $user (sort keys %user_counts) { my $r_counts = $user_counts{$user} ; print "$user:\n" ; foreach my $month (sort_months(keys %$r_counts)) { my $count = $user_counts{$user}{$month} ; print " $month $count\n" ; } ; } ;
where $r_counts is a reference to a hash that gives the count per month. keys %$r_counts returns the keys in "the hash" ('%') "refered to by the scalar $r_counts".

Depending in how you want the results organised, you may want to change how it's printed or even how it's stored -- the essence is that you can extract the keys using keys and then use them in whatever order you want to look up the data you have collected.

His/Her Grace chose the $month as the primary key, and chose to hold the totals count as the conventional user 'TOTAL' -- you can make up your own mind which order the keys should be in, and whether you think there's an real chance of a real user called 'TOTAL'.