At the risk of stating the bleeping obvious, I would use a database for this. It's exactly the sort of task that most database engines are optimised for. While you might not want to store the data from run to run, even in this situation I would create a temporary database, populate it with the two files, and then use its query language to extract the data.