Uh, sounds like it would be massive overkill for "most people's everyday requirements". :-)
| [reply] |
sounds like it would be massive overkill for "most people's everyday requirements".
Maybe, but once you go bigger than 4GB, you have to start dealing with 64-bit integers, which at 16 million TB is really overkill :)
So, since I also need to keep track of the length of each record/line, I figured that using the lower 48 bits for offsets (256TBmax) and the upper 16-bits for the length (64k), means that I can manipulate 'record descriptors' which are 64-bits each.
Not only are these easily manipulated as 'integers', they are also a cache friendly size which might also yield some performance benefits.
In an ideal world, the split point would be a runtime option which might allow (say) dealing with genomic stuff where individual sequences can be substantially bigger than 64k; but overall file sizes tend to be much smaller. But I cannot see an easy way to make that decision at runtime.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] |
Just asking out of curiosity did you write this utility? If yes can you please share it with us?
| [reply] |