Re^5: [OT] A measure of 'sortedness'?

BrowserUk:

Yeah, if the key distribution isn't uniform, that second idea flies right out the window. Since your records are so small, overhead of pointers would suck. On the other hand, since the records *are* so small, you'd just swap the records around anyway, as you wouldn't gain anything to speak of by simply shuffling the pointers.

With respect to the radix sort, though: Since it's a stable sort, it has a nice feature: instead of having to finish up with a merge, you just concatenate the lists end-to-end. That won't help you in this case, as you don't want to make many passes over your file to sort it, but it's something you can take advantage of in some situations.

It sounds like you may have to go the route taken by the team that did that funky tree you talked about a few weeks ago (don't recall the name of it ... Judy tree, perhaps?), in that you may have to have multiple algorithms and adaptively use them based on the character of the keys/record lengths/other attributes of your file. I have, however, no ideas for how you'd go about implementing such a thing, though.

Another thought--It's pretty whacky, though, so stop here if you have a weak stomach.

If you know the page size of your filesystem, if you could arrange your block rearrangement to target those block sizes, you could swap 'em by editing the filesystem metadata. (I did a FAT-12/FAT-16/FAT-32 filesystem at one time. While I didn't do any edits like that, it wouldn't be difficult in that case.) I don't know a lot about the internals of modern filesystems, but you might be able to find a filesystem you could run under windows that you could modify to give you that ability.

I *did* warn you that it was a whacky idea...

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Comment on Re^5: [OT] A measure of 'sortedness'?