Does the sequential pass of the buffers have to be done first?

Given that the merge sort runs through each list fully in the process of merging, could you not make the results of the scan a side effect of the merge?

I suppose you might be able to determine an upper limit during your sequential scans, to see how much buffer space would be required to do the merge without swapping.

On the other hand, even if you have to do some swapping in order to keep the buffer to a fixed value, you're still going to get linear worst case (starting with list A before list B, if all the Bs sort to the front, then you'll have to swap each A into the vacated B positions before moving them once more to their final position.

--AAAAABBBBB #B's sort before A's, - is empty buffer B-AAAAA-BBBB BBAAAAA--BBB BB-AAAAa-BBB #A moved to a to make room BBBAAAAa--BB BBB-AAAaa-BB BBBBAAAaa--B BBBB-AAaaa-B BBBBBAAaaa-- BBBBB-Aaaaa- BBBBBaA-aaa- #Each A ends up moving twice; O(1.5N), linear. BBBBBa-aaaa- BBBBBaa-aaa- BBBBBaaa-aa- BBBBBaaaa-a- BBBBBaaaaa-- # Plus 1.0N if you want to shift the set # back to the original address range: --BBBBBaaaaa

Perhaps the more important question is in the nature of the items. How much work does it take to determine their value/sort order, and how much work does it take to move them?

If they are complex objects with an expensive sort function, you could make the initial pass pre-calculate that. If the values of sequential items are correlated, you could possibly save time and/or brainpower doing it up front.

If the moves are particularly expensive, then the pre-calculations getting a measure of sortedness could provide an improved time to completion estimate by guessing how many items need to be swapped twice to do it in-place.

PseudoEdit:
sauoc's idea sounds like a way to optimize down the number random disk accesses by doing bunches of objects at a time. I got the impression your data was all in memory however.


In reply to Re: [OT] A measure of 'sortedness'? by SuicideJunkie
in thread [OT] A measure of 'sortedness'? by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.