It sounds to me like you would be better off writing code to slurp the files into a database with indexes. That would make finding exact matches run fast (Most databases use reg-green binary trees for indexes). The data import would probably be the biggest time-hog, I would suggest finding a way to import the data directly from the text files into the database, and then using DBI to manipulate the data. That's just my 2 cents.