comment on

As you read though the file, construct a datastructure like this:

%events = ( 1 => { '10/27/2002' => 1,
                   '09/15/2004' => 2,
                   '03/24/2003' => 1 },
            2 => { '02/17/2001' => 1,
                   '12/05/2003' => 2 },
            3 => .....                 );
[download]

That way, as you iterate through the flat file, you'll use state ID to get to the top-level of the hash of hashes, and dates will become the keys for the 2nd level. You don't really care about an event count per date, but as long as you're using hash keys to guarantee unique dates within each state, there's no reason not to use a ++ to increment the event count for that particular date.

Now, you can count event dates in state 4 like this:

my $date_count = keys %{$events{4}};
[download]

It's all about the right datastructure. ...having said that, someone undoubtedly will come up with an even better one. ;)

On the other hand, if you plan to search by date more often than by state, reverse the order so that the top level key is a date, the second level is state.

Update: Just saw the comment that "others will use a hash without thinking." I did think about it, and decided to use a hash because not all 50 states have tornados (ie, a sparse array, which is a good use of a hash). I don't think you'll ever see one in Hawaii, nor in Washington or Vermont. ...at least maybe not within a narrow timeframe of a few years. Also I chose the hash approach on the off chance that as the script gets refined, state numbers might end up becoming state names or state abbreviations, which lend themselves better to hash keys. After discussing it in CB I have to admit that an array will give a more efficient by-state lookup, while the hash will make it easier to sum up the number of states that had an event. Maybe it's a tossup. ;)

Dave

In reply to Re: Help with calculating stats from a flat-file database by davido
in thread Help with calculating stats from a flat-file database by nadocane

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.