Use a database. This is exactly what a database is designed to do. Any work you do would simply be replicating what has literally had man-DECADES thrown at it.
- Load files into a database. 2M rows and 8M rows are small-ish tables.
- Run SQL queries against those tables.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?