in reply to Increase script speed

So you have a 10+ million lines long file test.txt with mailbox IDs that need to be mapped into your directory structure in order to delete an equally (or even larger) huge number of files.

Assuming your test.txt contains no duplicates (removing duplicates would be a possible first optimization step), it also means that the number of files in your mail file directory must be even bigger and walking these directories multiple times will indeed take forever as those are timewise "expensive" operations.

I'd suggest to walk this directory structure once only and put the information found into a database with the full path as the key, and the ID and size as non-key fields with the ID field indexed.

Then building your "files_to_delete" list becomes a simple SQL exercise and as databases are optimized to handle large datasets, you will probably see a significant speed-up.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics