Re: how to parse large files

Looking at the second example code, it looks like the regexps are being built from scratch on a per use basis, in spite of there being only one call to build them all at the beginning.

The first example seems therefore a better basis to optimise from. And I would be inclined to blame its performance problems on the fact that you are getting a new output filehandle per line which is causing huge object proliferation to take place. You only need the output filehandle to be constructed once, e.g.:

my %lookingFor; 
# keys => different name of one subset
# values => array of one subset
my $fh = new FileHandle "< largeLogFile.log";
my $writeFh = new FileHandle ">> myout.log";

while (<$fh>) {
  foreach my $subset (keys %lookingFor) {
    foreach my $item (@{$subset}) {
      if (<$fh> =~ m/$item/) {
        print $writeFh <$fh>;
      }
   }
}
close $fh or die $!;
close $writeFh or die $!;
[download]

Update: your code will also fail a "use strict" which should be placed at the beginning. To solve that, construct the hash so that its values are array references rather than array names and the inner loop should change to...

    foreach my $item ( @{$lookingFor{ $subset }} {
[download]

-M

Free your mind

Comment on Re: how to parse large files Select or Download Code