Answer by point
- Yes, monitoring the processors using htop shows that the CPU utilization is pretty low
- You are probably right. I create some test datasets to investigate
- The vast majority of the files changes with each iteration, so there would be limited benefit in this approach In several parallel workflows, I do a lot of caching, originally with BerkeleyDB but I moved to LMDB about a year ago and got a nice performance bump.
Thanks for your response, lbe