Thanks for your help everyone...
Here's where I am at now:
#1. Text::CSV_XS is extremely slow for some reason.
#2. SQL-lite/etc are not feasable here...
#3. I would like to see a 10-fold performance increase over the figures I describe below....
#4. So far Storable seems like the best option, but this leaves me with another problem... Please allow me to explain further:

This is essentially a "log file" containing something which might be represented this way:
hash->{KEY1}=VALUE1 hash->{KEY2}=VALUE2 hash->{KEY3}->{SUBKEY1}=VALUE3 hash->{KEY3}->{SUBKEY2}=VALUE4 hash->{KEY4}=VALUE5


So each "record" will contain at this point in time 16 keys (it is expected to grow).

.. That being said, here's the 'rest of the story': I need to compute TOTAL, MAXIMUM, and LAST values for each one of the elements of the hash (include "SUBKEY"'s) over several periods of time.. So for a time period of like an hour I would be looking for the total, maximum, and last value of each one of those values.... The way I did this before was to build a second hash, modeled after the first with the keys changed to "TOT::(key)", "MAX::(key)", and "LAST::(key)"...
Benchmarks have shown me that doing the "split/map" method and computing these values takes somewhere around .0008 seconds (cpu time) per record. Using the Storable method, I am able to read the original hash in very quickly, but when iterating through it I am still showing benchmarks of .0007 seconds. The number of records are enough that after doing all of this at .0008 seconds it ends up taking more then 60 seconds of machine time. While this may not see enormous, it pushes me beyond the window I have to work in....
Here is the snippet I am using to iterate through the hash (note that I iterate through the array in reverse order newest to oldest records).. I am also typing this from memory, so it may not be exactly correct...:
my @keys=keys(%{$hash}); my $junk; foreach my $key (@keys) { if(ref($hash->{$key})) { my @keys2=keys(%{$hash->{$key}); foreach my $key2 (@keys2) { $junk->{'TOT::'.$key}->{$key2}+=$hash->{$key}->{$key2}; if($hash->{$key}->{$key2} > $junk->{'MAX::'.$key}->{$key2}) { $junk->{'MAX::'.$key}->{$key2}=$hash->{$key}->{$key2}; } if(!defined($junk->{'LAST::'.$key}->{$key2})) { $junk->{'LAST::'.$key}->{$key2}=$hash->{$key}->{$key2}; } } } } else { $junk->{'TOT::'.$key}+=$hash->{$key}; if($hash->{$key} > $junk->{'MAX::'.$key}) { $junk->{'MAX::'.$key}=$hash->{$key}; } if(!defined($junk->{'LAST::'.$key})) { $junk->{'LAST::'.$key}=$hash->{$key}; } } }

Thanks, again, for all the helpful replies!

You folks are great!

- Greg

In reply to Re: Need to process a tab delimted file *FAST* by devnul
in thread Need to process a tab delimted file *FAST* by devnul

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.