A more technical tip. Try sorting your data and using a btree format for your data. With a hash you do a lot of seeking to disk, and seeks to disk are slow. 1/200th of a second per seek may not sound like a lot, but try doing 100 million of them and you will take the better part of a week. But a btree loaded and accessed in close to sorted order does lots of streaming to/from disk and that is quite fast. (And a merge sort streams data very well.)
In reply to Re^3: Netflix (or on handling large amounts of data efficiently in perl)
by tilly
in thread Netflix (or on handling large amounts of data efficiently in perl)
by Garp
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |