Re: [OT] A measure of 'sortedness'?

Re-Update: I should have known BrowserUk would not have tossed out an easy problem, so of course the buffers are not half-filled. Double-buffer shuffle with little to no air in the combined buffer space. Now I really will be up all night, as this just became a truly fun problem.

Update: Shout-out to ++SuicideJunkie; seems like some of the same thoughts, only yours were more coherently assembled.

I'll probably be up all night thinking about this, so thanks in advance for giving me something to do when insomnia kicks in tonight.

The in-place requirement for the merge is interesting, and highlights some clarifying questions which might, in turn, inspire some pre-optimization. Please bear with me, I'm writing this off the cuff.

Inventory:

Two input buffers;
Both buffers pre-sorted;
Merge-in-place required;
- Which place?
  - Input buffers start only half-filled?
  - Result needs to be split between the two input buffers?
- Okay, stop interrupting yourself, marinersk. Get on with it.

Okay, I'm going to proceed on the assumption that the merge is to buffer1, which is currently only 50% or less consumed, making room for the whole merged buffer to fit into buffer1 without hassle.

<ramble>
Okay, small side buffer is allowed. For the moment, let's pretend we're using one, and see later if we can move that into the spare space in the buffer itself.

<lightbulb>
Hey, borrow an idea from WWII Pacific Theater combat airplane design -- fuel tank bladders. As data is migrated from buffer2 to buffer1, more and more free space opens up in buffer2 which could be used to house a potentially growing amount of useful clue data.
</lightbulb>

So the first thing that popped into my head on the first read: Is there any value in knowing a small number of key values in advance? If the buffers were unsorted, I'd go for "Where is the lowest" and "Where is the highest", but they're already sorted.

So how about the median? Is there anything about our merge operation that might be expedited by knowing key points like the median value offset? So our search and placement algorithm knows at what point to optimize to the second half of the buffer?

Hmm. Gut says no, you're likely to be incrementing through your buffer indices on the fly.

So here's a question: Do you have the privilege of choosing in which order the buffers are pre-scanned? If so, scanning buffer2 first would permit you to collect potentially useful information like lowest value, and whilst scanning buffer1 you could make note of where that first insert will take place. Can that be used to optimize the first merge pass? At the very least, it should optimize edge cases like data collusion/dispersion.

My mind now wants to wander into advanced sort/merge logic, like tossing values into the spare space in the buffers akin to the way a binary sort reads the data, later to be assembled in order in a quasi-linear fashion, or something similarly crazy. That would take some careful thinking, and so I am going to take that thought as an indication that my ramble timer just expired.
</ramble>

Back to you, BrowserUk.

Comment on Re: [OT] A measure of 'sortedness'? Select or Download Code