in reply to how to split huge file reading into multiple threads
First you need to determine where your bottleneck is. If it's disc IO speed, splitting the problem into multiple threads doens't help at all, it might even make things worse. Then the only thing you can do is to buy faster discs.
If CPU is the bottleneck, it might make sense to investigate threads or processes.
Use Devel::NYTProf (on a reduced data sample, but please make it big enough that it doesn't fit into the buffer cache) to find out what steps take the most time. If it's readline() or so, don't even think of threads.
|
|---|