in reply to Re^2: Memory utilization and hashes
in thread Memory utilization and hashes

What does this sample of data you provided look like after the *nix sort ?

Query;1;host;www.example.com Answer;1;ip;1.2.3.4 Query;2;host;www.cnn.com Query;3;host;www.google.com Answer;2;ip;2.3.4.5 Answer;2;ip;2.3.4.5 Query;4;host;www.google.com Answer;4;ip;3.4.5.6 Answer;3;ip;3.4.5.6 Query;2;host;www.example2.com Answer;4;ip;1.2.4.5 Answer;2;ip;2.3.4.5
poj

Replies are listed 'Best First'.
Re^4: Memory utilization and hashes
by bfdi533 (Friar) on Jan 25, 2018 at 20:30 UTC

    There is actually missing data in the sample data. In the real data file, it includes the date and time of the entry.

    Once sorted by date and ID, then I can be sure that if the date changes and the ID changes as well, then there are no more answers to be had and I can dump the data, empty the hash and move on.

    The real file is more like this once sorted:

    2018-01-25 01:01:01;Query;1;host;www.example.com 2018-01-25 01:01:01;Answer;1;ip;1.2.3.4 2018-01-25 01:01:05;Query;2;host;www.cnn.com 2018-01-25 01:01:05;Answer;2;ip;2.3.4.5 2018-01-25 01:01:05;Answer;2;ip;2.3.4.5 2018-01-25 01:01:06;Query;3;host;www.google.com 2018-01-25 01:01:06;Answer;3;ip;3.4.5.6 2018-01-25 01:01:08;Query;4;host;www.google.com 2018-01-25 01:01:08;Answer;4;ip;3.4.5.6 2018-01-25 01:01:08;Answer;4;ip;1.2.4.5 2018-01-25 01:01:11;Query;2;host;www.example2.com 2018-01-25 01:01:11;Answer;2;ip;2.3.4.5