From my pov your main problem is that you create a thread for every file. This is a language-independent issue.
I would suggest that your working threads take the filenames from a shared queue and process filenames until the shared queue is empty. Thus you have the overhead for creating threads only 10 and not 20.000 times.
Check Thread::Queue for the shared queue.
I am using threads heavily in real-time programs and have not seen any memory problems which are related to threads.