When sorting data, there are often multiple 'keys' by which
you want to reference that data. For instance, you might
want to refer to data by the week number it was produced,
but you also want to refer to it by a grouped category, such
as a test pack name.
In these cases, one way to sort out a pile of data is to save
it to a set of files, whose names are the keys to the data
itself.
For instance, one file might be named "rawdata.11.Aspen", which
contains all extracted data for the Aspen test pack for week 11.
The word 'rawdata' would indicate that this data has simply been
sorted into file buckets, and still needs to be worked on.
There are lots of benefits to sorting this way. The main one is
that the file system has become your hashtable.
For example, suppose you have sorted out all your data into
these raw files. To get a list of all the test packs run in week 11,
all you have to do is:
my @week11 = <rawdata.11.*>;
Or, if you want all of the Aspen test data for all weeks:
my @aspen = <rawdata.*.Aspen>;
If you want to process your raw data files, there's no reason
you can't process them as they come, so:
&process $file foreach $file <rawdata.*>;
And your &process() subroutine can create processed-data
files, such as processed.Aspen.11 and totals.Aspen.11, and
so on for each test pack and each week. Then a &total()
subroutine can read in <totals.*> and sum up the totals.
Why bother using the filesystem? Why not use hash tables
and arrays?
Use hashtables when your key is simple or easy to fabricate,
or where you don't need to do searches for particular keys
or key groups. I don't think you can easily glob on hashtable
keys (can you?). Use hashtables and arrays when the data set
is small.
Additionally, use the filesystem to avoid writing a series of
nested loops. Nested loops are the bane of any cron job, using
precious machine time instead of (perhaps) less-precious disk
space.
Don't use hashtables and arrays when your data set is huge.
50 megabytes is huge. 1 megabyte might not be huge. Your
mileage may differ. Perhaps the key is to not put too much
stress on your computer's RAM.
All the things above which are not facts are my opinions.
Rob