This might be one of the cases where it's faster to have the human do the heavy lifting. Have you considered changing the script to just display the file a line at a time, and prompt the user for some input about what the line is? The script could take a guess using your existing code, and use that guess as a default, but for a reasonable number of files (I'd say less than a thousand), it probably makes sense to make the human's job easier rather than try to replace entirely. (Like some OCR programs that will prompt for things they don't get, with a 'best guess' displayed for consideration.)