in reply to how to parse large files

domcyrus:

In addition to all the other notes you have, you might want to change the structure of your loop a bit. Rather than search over all the values in a hash, make an inverted hash and then just do a single lookup for each value. That way, you needn't execute nested loops.

Specifically:

#!/usr/bin/perl -w use strict; use warnings; my %lookingFor = ( 'lazy' => ['lazy', 'tired'], 'entry' => ['entry', 'opening', 'ingress'], 'file' => ['file', 'files', 'filehandle'], 'such' => ['such'] ); # Build an inverted hash with pointers from the individual items # to the matching key in lookingFor my %revLUP; for my $k (%lookingFor) { for my $v (@{$lookingFor{$k}}) { $revLUP{$v}=$k; } } while (my $buf = <DATA>) { # Print line if it has a 'magic word' in it print $buf if grep { defined $revLUP{$_} } split /\s+/, $buf; } __DATA__ Now is the time for all good men to come to the aid of their party. The quick red fox jumped over the lazy brown dog. [tye]: yes, on Window or on Unix, the old file is still open so it is just its directory entry that gets clobbered [bart]: On Linux, you can unlink a file and the processes that have the file open, will still see the contents. I suspect the same happens here. [tye]: "busy" only seems to apply to executable files, talexb. no problem deleting files that are open (though Win32 C RTL /defaults/ to locking the file such that this is prevented) [blokhead]: in short, the filehandle is tied to an inode, not a filename [bart]: Meaning, the directory points to the new contents, and the old contents is unlinked (but visible). Is that correct?
which yields:
$ ./bigfile.pl over the lazy brown dog. the old file is still open so it is just its directory entry that gets clobbered file and the processes that have the file open, will still see the contents. I deleting files that are open (though the file such that this is prevented) [blokhead]: in short, the filehandle $
I don't know if this method will save you any time or not, as I haven't done any benchmarking. In any case, it may have a lot to do with the number of items in %lookingFor, the performance of grep, etc. But if this helps at all, then you can then look for further speedups.

--roboticus