Re: Re: Re: Need to process a tab delimted file *FAST*

Specifically, tilly's advice can be implemented with judicious use of each. Doing this transforms your example into:

my $junk;
while (my ($outterkey,$outterval) = each %$hash)
{
  if(ref($outterval))
  {
    while (my ($innerkey,$innerval) = each %$outterval)
    {
       $junk->{'TOT::'.$key}->{$key2}+=$innerval;
       if($innerval > $junk->{'MAX::'.$key}->{$key2})
       {
          $junk->{'MAX::'.$key}->{$key2}=$innerval;
       }
       if(!defined($junk->{'LAST::'.$key}->{$key2}))
       {
          $junk->{'LAST::'.$key}->{$key2}=$innerval;
       }
    } 
  }
  else
  {
    $junk->{'TOT::'.$key}+=$outterval;
    if($outterval > $junk->{'MAX::'.$key})
    {
      $junk->{'MAX::'.$key}=$outterval;
    }
    if(!defined($junk->{'LAST::'.$key}))
    {
       $junk->{'LAST::'.$key}=$outterval;
    }
  } 
}
[download]

Now, you've still got two recalculations of $junk->{'MAX::'.$key} and $junk->{'LAST::'.$key} (each involving a string concatenation and a hash lookup) per loop, whereas it might be possible to have only one, but I'll let you look at this first, to see if it gives acceptable speed improvements (not likely, given that this is squeezing out the last few extra percent, and you want 90% of the time gone).

One test I'd do if I were you to make certain that the kind of speed you want is even vaguely possible is time the unix wc command against the input file. I often use wc as a absolute lower bound when looking at file processing speed issues, and figure that I've done as good as I can if I can get the speed down to within twice the time that the wc executeable takes.

Also, on your initial text processing question, is this a faster way to split up the file?

my %hash;
while ($line = <INPUTFILE>) {
  chomp($line);
  while($line =~ /\t([^=]*)=([^\t]*)/g)
  {
    $hash{$1}=$2;
  }
}
[download]

This assumes that you're discarding $tstamp, which you seem to be.

Comment on Re: Re: Re: Need to process a tab delimted file FAST Select or Download Code