in reply to Hash Search is VERY slow
The general idea is presorting, for instance by iterating multiple times over the file and only processing one IP range after the other.
This is expensive in IO but will create smaller data structures.
... you really need all the data present in memory at once, consider breaking up the ranges into a tree of nested data structures and processing them in linear order.
like $hash->{'192'}{'168'}{'101'}{'208'} or $hash->{'192.168'}{'101.208'} instead of $hash->{'192.168.101.208'} °
If you now process all IPs in order , then Perl (well the OS) will be able to swap all memory-pages with unrelated sub-hashes out. This will be cheap because the number of swaps is minimized by the sorting. (see also Re: Small Hash a Gateway to Large Hash? )
An additional approach is using more compact data structures, hashes are efficient for sparse data. But if your IPs range from 0-255 an array is certainly more efficient.
Furthermore, there is no point in repeating URLs like "logmeinrescue.com" in your array, counting them is more memory efficient.
Anyway 800k lines input doesn't sound heavy though, not sure if we have the full picture (???)
Like choroba already said, preloading the input completely into memory sounds like a waste of resources, you should check how much that costs.
OTOH if you decided to implement my initial idea to process one IP range after the other, it'll reduce IO if (and only if) all fits into memory.
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
°) I'm aware that 192.168.*.* is very common
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Hash Search is VERY slow
by Tux (Canon) on Sep 30, 2021 at 07:59 UTC | |
by LanX (Saint) on Sep 30, 2021 at 11:19 UTC | |
by Tux (Canon) on Sep 30, 2021 at 11:54 UTC | |
by LanX (Saint) on Sep 30, 2021 at 11:59 UTC | |
by Tux (Canon) on Sep 30, 2021 at 13:07 UTC | |
| |
by soonix (Chancellor) on Sep 30, 2021 at 13:02 UTC |