in reply to Speeding a disk based hash
Beyond that, at Re: size on disk of tied hashes I gave an explanation of some of the performance problems with dealing with disk on large datasets, and briefly discussed some of the options that are available. Note that if you care to benchmark your application, you do not want to benchmark it with random data. Do it with a sample of real data. Disk performance is strongly affected by your access pattern, and real world access patterns are not very random (else caching would not be a good idea).
|
|---|