As you read though the file, construct a datastructure like this:
%events = ( 1 => { '10/27/2002' => 1, '09/15/2004' => 2, '03/24/2003' => 1 }, 2 => { '02/17/2001' => 1, '12/05/2003' => 2 }, 3 => ..... );
That way, as you iterate through the flat file, you'll use state ID to get to the top-level of the hash of hashes, and dates will become the keys for the 2nd level. You don't really care about an event count per date, but as long as you're using hash keys to guarantee unique dates within each state, there's no reason not to use a ++ to increment the event count for that particular date.
Now, you can count event dates in state 4 like this:
my $date_count = keys %{$events{4}};
It's all about the right datastructure. ...having said that, someone undoubtedly will come up with an even better one. ;)
On the other hand, if you plan to search by date more often than by state, reverse the order so that the top level key is a date, the second level is state.
Update: Just saw the comment that "others will use a hash without thinking." I did think about it, and decided to use a hash because not all 50 states have tornados (ie, a sparse array, which is a good use of a hash). I don't think you'll ever see one in Hawaii, nor in Washington or Vermont. ...at least maybe not within a narrow timeframe of a few years. Also I chose the hash approach on the off chance that as the script gets refined, state numbers might end up becoming state names or state abbreviations, which lend themselves better to hash keys. After discussing it in CB I have to admit that an array will give a more efficient by-state lookup, while the hash will make it easier to sum up the number of states that had an event. Maybe it's a tossup. ;)
Dave
In reply to Re: Help with calculating stats from a flat-file database
by davido
in thread Help with calculating stats from a flat-file database
by nadocane
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |