in reply to Perl Optimization
Invert your search logic. Instead of using your hash keys as an array, use it as a hash and benefit from it's O(1) lookup abilities:
my %hash = ...; while( $line = <fh_log> ) { for( split ' ', $line ) { if( exists $hash{ $_ } ) { $stats{ $_ }++; $total++; } } }
Let's assume your 1 MB file results in 8000 keys. And your 300MB file contains 2.5 million lines. Your way, you are performing 20 billion O(n) regex searches against the length of the lines in the big file.
If those lines are 128 characters and split into say 16 words, this way you perform 4 million O(1) hash lookups. That's (much) less than 1/5,000 the amount of work/time.
If you can construct a regex to extract only likely tablename candidates from your lines and can reduce the 16, to say 4, you could get that down to 1/20,000 or 0.005%
If the casing can vary between the table names in the hash keys and the table names in the big file, as implied by your use of /i, lower (or upper) case both before lookup.
You should never iterate the keys of a hash when searching, if there is any possibility of not doing so.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Perl Optimization
by Chivalri (Initiate) on Aug 11, 2008 at 20:26 UTC |