in reply to Re^3: [OT] A measure of 'sortedness'?
in thread [OT] A measure of 'sortedness'?

I think this has triggered a workable idea.

Given a 2GB buffer of (minimum) 8-byte records = 268,435,456 records; using 1-bit per record requires 32MB. Well within my 10%.

So, what can I record with 1-bit?

My first though is that I can record whether this record is contiguous with the preceding one. Ie. Is is equal to it; or exactly one greater. In effect, contiguous runs of 1-bits become chunks of the buffer(s) that can be translocated to their final position en-masse.

And (I think) by processing down those bitmaps in parallel (though not lock-step), and knowing the first value in each of the two buffers, it gives me a a straight forward map of how things have to be moved into their final positions.

Does that convert the original O(N2) process into an O(2(or 3?)N) process?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^5: [OT] A measure of 'sortedness'?
by marinersk (Priest) on Mar 19, 2015 at 00:31 UTC

    Not sure -- it's been a long time since I've had anything detailed enough to warrant computational analysis of the execution cost (I miss the old days when I worked on fun stuff).

    A side trip, though, if a full run length index might have had value (does not look like it here), here's a Perlized representation. Some of the code is sloppy, but I didn't go home like I said I would so I optimized Wife-Deprivation Units at the cost of Code-I-Could-Be-Proud-Of Units.

    Okay, so here's a proof-of-concept. The assumption is that a Run-Length table will consume notably less space than the original data, such as might be true in a disk I/O operation. This is not a universally-applicable assumption but may yield thoughts toward optimization in your specific case.

    Results on my test run:

    D:\PerlMonks>perl merge1.pl Buffers: [AAAAAABBCCCDDDDEEEEE] [AAAABBBBBBCCCCCDDDDD] Buffers: [AAAAAAAAAABBBBBBBBCC] [CCCCCCDDDDDDDDDEEEEE] D:\PerlMonks>

      With a (typically) minimum keysize of 64-bits, whilst repeats are not impossible, they are unlikely.

      But also, the same problem is true of my "equal to or greater by one" runs; they simply won't happen often enough to be of value.

      What's needed are runs of adjacent values in one buffer that don't appear in the other buffer; and vice versa.

      That means tracking both buffers concurrently -- which as I indicated in the OP is quite possible -- but it requires significantly more code and comparisons. Sufficiently more that it probably doesn't save anything when doing the merge :(


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        I'm not ready to give up but it seems all the low-hanging fruit doesn't fit into the salad.

        I suspect you are correct, however, that without a new, inspired algorithmic approach to identifying optimization points in the merge, the platform constraints hamper conventional efforts to pre-select merge points (and other optimization efforts of that ilk).

        In short, not enough hardware, too much data variance.

        I'm not saying it can't be done. I'm saying no path to solution is clear to me at this point.

        Solution to this one will require either a profound application of theoretical mathematics, or an invention -- likely one worthy of a patent.

        I think your bitmap idea still has merit, but employed slightly diffrently. It's not an algorithmic reduction, but it does reduce overhead by 50% (one buffer scan instead of two). Not going to win any awards, but 50% overhead reduction isn't devoid of merit.

        If you build the bitmap during the mandatory pre-merge scans (this would require a parallel, non-lock-step scan as you have already proposed) and use it to merely indicate which source buffer the resulting buffer data will come from (essentially doing the merge decision during the pre-scan), then your merge pass doesn't have to think -- it just has to do.

        Buffers: [AABEFFHHKMOOQRRSSTUV] [ABCDEHIJKLLNPPRSVVWY] Bitmap: [0011011100000111101101001100011000001111]

        If you can cheaply convert the length of the alternating runs into an integer
              <lightbulb>
              or just store them as (2- or 4-bit?) integers to begin with
              </lightbulb>

        Then you can just feed that into your block data move routine and achieve maximum blocking/minimum data move, all from metadata collected on that first mandatory linear pass.

        I would presume you'd use the rest of that 10% OOL buffer to perform the actual data migration.
              <lightbulb>
              If you performed the actual data migration in reverse, you could shrink the bitmap/runlength data as you use it, and thus increase the amount of linear usable freespace in the buffer for the actual data migration, which you would naturally build from the back of the OOL buffer and stop before you step on your shrinking metadata. By the last pass you could be using almost the entire OOL buffer for data.
              </lightbulb>

        On each pass, the OOL buffer looks something like below. I say "something like" because, like a bad car mechanic, I wound up with spare parts at the end -- I clearly messed something up doing it by hand in a text editor LOL. (This is why I write programs to do the work for me. They don't miss. They don't get bored. They just do.)

        OOL Buffer: The front of the buffer contains the runlength index The back of the buffer contains the data to be migrated The dots represent freespace in the buffer [2213541211223254...........................................] [221354121122325........................................VVWY] [22135412112232....................................SSTUVVVWY] [2213541211223...................................RSSSTUVVVWY] [221354121122.................................QRRRSSSTUVVVWY] [22135412112................................PPQRRRSSSTUVVVWY] [2213541211...............................OOPPQRRRSSSTUVVVWY] [221354121...............................NOOPPQRRRSSSTUVVVWY] [22135412..............................LMNOOPPQRRRSSSTUVVVWY] [2213541.............................KLLMNOOPPQRRRSSSTUVVVWY] [221354.............................KKLLMNOOPPQRRRSSSTUVVVWY] [22135..........................HHIJKKLLMNOOPPQRRRSSSTUVVVWY] [2213......................EEFFHHHIJKKLLMNOOPPQRRRSSSTUVVVWY] [221....................BCDEEFFHHHIJKKLLMNOOPPQRRRSSSTUVVVWY] [22....................BBCDEEFFHHHIJKKLLMNOOPPQRRRSSSTUVVVWY] [2...................AABBCDEEFFHHHIJKKLLMNOOPPQRRRSSSTUVVVWY]

        Of course you still have all the same other problems to solve: How do you minimize data motion and where do you put data which is going to be displaced by the eventual write from OOL to Main buffer, etc.

        I'd have to model it to get a feel for what the real savings would be, if any.

        It is dependent on either an inexpensive conversion of bitrun-to-integer, or the ability to use and store reasonably-sized tinyint incrementors to be of use. Would not expect Perl to be your tool of choice. :-)

        Update: Removed department of redundancy department.