Turns out that the unix sort was exactly the prior step that was missing to help speed this up. With a correct choice of keys, the file now is in sequential order by "ID" and when a new Query comes in, it is now easy to check if the current "ID" = the prior "ID" and flush any accumulated hash entries and continue. This keeps the hash to, in testing, no more than 3-7 'extra' keys for each set of "ID"s in the file and then dumps the set.
Memory usage has stayed small and the processing is now approx 1/4 the total time of the prior runs.
In reply to Re^2: Memory utilization and hashes
by bfdi533
in thread Memory utilization and hashes
by bfdi533
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |