Sorry for the typos in the code; fixing them.
My actual data consists of data from several hundred MB to several hundred GB so that sample data set is just a sample of the sort of thing I am processing.
The two queries and two answers in a row is what my real world data contains, specifically there can be anywhere from 1 to n answers for each query and the queries and answers occur in any order and the only guarantee is that the answer will follow (sometime later) the query it goes with.
Max rows in files to process = 31291204, average lines in files 8707186.
In reply to Re^2: Memory utilization and hashes
by bfdi533
in thread Memory utilization and hashes
by bfdi533
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |