in reply to Re^2: Biggest file? (Conclusion?)
in thread Biggest file?
sounds like it would be massive overkill for "most people's everyday requirements".
Maybe, but once you go bigger than 4GB, you have to start dealing with 64-bit integers, which at 16 million TB is really overkill :)
So, since I also need to keep track of the length of each record/line, I figured that using the lower 48 bits for offsets (256TBmax) and the upper 16-bits for the length (64k), means that I can manipulate 'record descriptors' which are 64-bits each.
Not only are these easily manipulated as 'integers', they are also a cache friendly size which might also yield some performance benefits.
In an ideal world, the split point would be a runtime option which might allow (say) dealing with genomic stuff where individual sequences can be substantially bigger than 64k; but overall file sizes tend to be much smaller. But I cannot see an easy way to make that decision at runtime.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Biggest file? (Conclusion?)
by curiousmonk (Beadle) on Mar 28, 2013 at 10:30 UTC | |
by BrowserUk (Patriarch) on Mar 28, 2013 at 16:58 UTC |