BrowserUk:

Yeah, if the key distribution isn't uniform, that second idea flies right out the window. Since your records are so small, overhead of pointers would suck. On the other hand, since the records *are* so small, you'd just swap the records around anyway, as you wouldn't gain anything to speak of by simply shuffling the pointers.

With respect to the radix sort, though: Since it's a stable sort, it has a nice feature: instead of having to finish up with a merge, you just concatenate the lists end-to-end. That won't help you in this case, as you don't want to make many passes over your file to sort it, but it's something you can take advantage of in some situations.

It sounds like you may have to go the route taken by the team that did that funky tree you talked about a few weeks ago (don't recall the name of it ... Judy tree, perhaps?), in that you may have to have multiple algorithms and adaptively use them based on the character of the keys/record lengths/other attributes of your file. I have, however, no ideas for how you'd go about implementing such a thing, though.

Another thought--It's pretty whacky, though, so stop here if you have a weak stomach.

If you know the page size of your filesystem, if you could arrange your block rearrangement to target those block sizes, you could swap 'em by editing the filesystem metadata. (I did a FAT-12/FAT-16/FAT-32 filesystem at one time. While I didn't do any edits like that, it wouldn't be difficult in that case.) I don't know a lot about the internals of modern filesystems, but you might be able to find a filesystem you could run under windows that you could modify to give you that ability.

I *did* warn you that it was a whacky idea...

...roboticus

When your only tool is a hammer, all problems look like your thumb.


In reply to Re^5: [OT] A measure of 'sortedness'? by roboticus
in thread [OT] A measure of 'sortedness'? by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.