This is not immediately related to your question, but if we knew more about your data (both the large file and the reference file), we might be able to suggest a solution where you would not need to read the large file so many times, but only once, leading to much better performance.