In Optimising a search of several thousand files, McDarren presented an interesting search problem.

He found his solution--narrowing the start point through his knowledge of where the crash happened and then applying a linear search--quite quickly, before I had a chance to offer my solution.

However, if you didn't have that knowledge of the data, this becomes quite an interesting problem. As the data points are not strictly ascending, you cannot use a binary search.

The data points plotted (in reverse time order) would look something like:

71 --- 70 --- 69-- --- 68 ------- --- 68 --- 67 --- 66 --- .. 0 --------------------------------------------------------------

Note: I've shown the possibility that he continued playing after the crash and accumlated an extra level from the restored value of 68.

The problem is similar to finding a local maxima or minima, for which there is an algorithm call a Ternary search. But if the he'd managed to raise his level after the crash, that would no longer work as it requires strictly descending values either side of the maxima. It would also need adapting to handle duplicate values.

Playing around, I came up with what I'll call a Probe Search in the absence of my ability to find any prior art that looks similar.

The basic idea is that you probe the dataset at some regular interval, looking for a value greater than the $currlevel value (or could be 0). If you fail to find a higher value (quite likely), you half the step size, loop back and probe again.

If you succeed in finding a higher value, you then have a upper/rightmost boundary at the probe that found it. And the preceeding probe forms a lower/leftmost boundary. So you set your hi and lo limits to these, recalculate the step size; reset the limit to the higher value you found, and loop back to the start.

Eventually, you will decrease the boundaries and increase the limit to the point where you will fail to find another higher value. You then step back from the hi limit linearly to find the leftmost file with the highest limit and you're there.

Here is a subroutine to implement the algorithm. I've substituted an array for the files for testing.

sub probe { use integer; my( $aref, $lo, $hi, $limit, $istep ) = @_; my $step = ( $hi - $lo ) / $istep; { my $i = $lo; $i += $step while $i <= $hi and $aref->[ $i ] <= $limit; if( $i >= $hi ) { if( $aref->[ $hi ] == $limit ) { --$hi while $aref->[ $hi - 1 ] == $limit; return $hi; } $step /= 2; } else { $limit = $aref->[ $i ]; $lo = $hi - $step; ## Update: Swapped the order of this li +ne and the next $hi = $i; $step = int( ( $hi - $lo ) / $istep ); # $istep was wrongl +y '10' } redo; } }

To test this out I generated an array of values to simulate the level values in the files:

my $ramping = 57000 - 9029; my $step = int( $ramping / 72 ); my $leftovers = $ramping - ( $step * 72 ); my $tiedObj = tie my @levels, 'Tie::Array::CountReads'; @levels = ( ( 68 ) x 9029, map( (( $_ ) x $step ), reverse 0 .. 71 ), (0) x $leftovers );

I then tied that array to a custom tie, to allow me to count the number of read references to the array. This represents the number of files you would have to read.

The interesting thing is that the size of the initial step for the probing has a dramatic affect on the efficiency of the search. Unfortunately, the relationship between the initial step size and the number of files that need to be read is rather chaotic. For example, the number of files that it needs to inspect can be as low a 34 with initial step size of either 58 or 82. But get the choice wrong by just a few and it could take as many as 670 with initial step size of 53. That's still far less than the 9030 that a linear search from the top takes though.

Update: I just found an minor error in my implementation (marked above and below). I was setting the low limit after I adjusted the high limit which meant that I was re-probing a much wider baand than necessary. Correcting that mistake means that the algorithm is now even faster. For the original problem, it now requiers visting just 7 files with the initial step size set to 11. That's a huge improvement at the good end, and fortunately, the change seems to have improved the performance across the board, although the worst case reamins about the same (though it's moves).

Update2: Another minor correction. (# $istep was wrongly '10') No great change to results.

This shows the number of files that need to be visited for all the initial steps sizes from 10 to 100

C:\test>junk8 Initial step size: 10 located 'file' 9695 after probing 318 files Initial step size: 11 located 'file' 10361 after probing 7 files Initial step size: 12 located 'file' 9029 after probing 475 files Initial step size: 13 located 'file' 10361 after probing 623 files Initial step size: 14 located 'file' 9695 after probing 505 files Initial step size: 15 located 'file' 9029 after probing 492 files Initial step size: 16 located 'file' 10361 after probing 332 files Initial step size: 17 located 'file' 9695 after probing 368 files Initial step size: 18 located 'file' 9029 after probing 476 files Initial step size: 19 located 'file' 10361 after probing 164 files Initial step size: 20 located 'file' 9695 after probing 306 files Initial step size: 21 located 'file' 10361 after probing 503 files Initial step size: 22 located 'file' 9695 after probing 673 files Initial step size: 23 located 'file' 9695 after probing 225 files Initial step size: 24 located 'file' 9029 after probing 475 files Initial step size: 25 located 'file' 9029 after probing 95 files Initial step size: 26 located 'file' 10361 after probing 608 files Initial step size: 27 located 'file' 10361 after probing 203 files Initial step size: 28 located 'file' 9695 after probing 489 files Initial step size: 29 located 'file' 9695 after probing 139 files Initial step size: 30 located 'file' 9029 after probing 475 files Initial step size: 31 located 'file' 9029 after probing 170 files Initial step size: 32 located 'file' 10361 after probing 335 files Initial step size: 33 located 'file' 10361 after probing 11 files Initial step size: 34 located 'file' 9695 after probing 371 files Initial step size: 35 located 'file' 9695 after probing 83 files Initial step size: 36 located 'file' 9029 after probing 479 files Initial step size: 37 located 'file' 9029 after probing 221 files Initial step size: 38 located 'file' 10361 after probing 143 files Initial step size: 39 located 'file' 9695 after probing 543 files Initial step size: 40 located 'file' 9695 after probing 284 files Initial step size: 41 located 'file' 9695 after probing 46 files Initial step size: 42 located 'file' 9029 after probing 481 files Initial step size: 43 located 'file' 9029 after probing 257 files Initial step size: 44 located 'file' 9029 after probing 47 files Initial step size: 45 located 'file' 9695 after probing 445 files Initial step size: 46 located 'file' 9695 after probing 229 files Initial step size: 47 located 'file' 9695 after probing 13 files Initial step size: 48 located 'file' 9029 after probing 479 files Initial step size: 49 located 'file' 9029 after probing 287 files Initial step size: 50 located 'file' 9029 after probing 95 files Initial step size: 51 located 'file' 9695 after probing 371 files Initial step size: 52 located 'file' 9695 after probing 182 files Initial step size: 53 located 'file' 9029 after probing 659 files Initial step size: 54 located 'file' 9029 after probing 479 files Initial step size: 55 located 'file' 9029 after probing 308 files Initial step size: 56 located 'file' 9029 after probing 137 files Initial step size: 57 located 'file' 9695 after probing 309 files Initial step size: 58 located 'file' 9695 after probing 139 files Initial step size: 59 located 'file' 9029 after probing 645 files Initial step size: 60 located 'file' 9029 after probing 475 files Initial step size: 61 located 'file' 9029 after probing 325 files Initial step size: 62 located 'file' 9029 after probing 175 files Initial step size: 63 located 'file' 9029 after probing 25 files Initial step size: 64 located 'file' 9695 after probing 110 files Initial step size: 65 located 'file' 9029 after probing 622 files Initial step size: 66 located 'file' 9029 after probing 479 files Initial step size: 67 located 'file' 9029 after probing 336 files Initial step size: 68 located 'file' 9029 after probing 204 files Initial step size: 69 located 'file' 9029 after probing 72 files Initial step size: 70 located 'file' 9695 after probing 89 files Initial step size: 71 located 'file' 9029 after probing 611 files Initial step size: 72 located 'file' 9029 after probing 479 files Initial step size: 73 located 'file' 9029 after probing 347 files Initial step size: 74 located 'file' 9029 after probing 227 files Initial step size: 75 located 'file' 9029 after probing 95 files Initial step size: 76 located 'file' 9695 after probing 59 files Initial step size: 77 located 'file' 9029 after probing 608 files Initial step size: 78 located 'file' 9029 after probing 478 files Initial step size: 79 located 'file' 9029 after probing 361 files Initial step size: 80 located 'file' 9029 after probing 244 files Initial step size: 81 located 'file' 9029 after probing 127 files Initial step size: 82 located 'file' 9029 after probing 23 files Initial step size: 83 located 'file' 9029 after probing 593 files Initial step size: 84 located 'file' 9029 after probing 481 files Initial step size: 85 located 'file' 9029 after probing 369 files Initial step size: 86 located 'file' 9029 after probing 257 files Initial step size: 87 located 'file' 9029 after probing 159 files Initial step size: 88 located 'file' 9029 after probing 47 files Initial step size: 89 located 'file' 9029 after probing 590 files Initial step size: 90 located 'file' 9029 after probing 485 files Initial step size: 91 located 'file' 9029 after probing 380 files Initial step size: 92 located 'file' 9029 after probing 275 files Initial step size: 93 located 'file' 9029 after probing 170 files Initial step size: 94 located 'file' 9029 after probing 80 files Initial step size: 95 located 'file' 9029 after probing 575 files Initial step size: 96 located 'file' 9029 after probing 479 files Initial step size: 97 located 'file' 9029 after probing 383 files Initial step size: 98 located 'file' 9029 after probing 287 files Initial step size: 99 located 'file' 9029 after probing 191 files Initial step size: 100 located 'file' 9029 after probing 95 files

Horribly chaotic, but the chaos seems to even out (reaching less extreme values at either end) if you use larger values for the intial step sizes. Here are the results of using step sizes from 100 to 1000 in steps of 10:

Initial step size: 100 located 'file' 9029 after probing 95 files Initial step size: 110 located 'file' 9029 after probing 317 files Initial step size: 120 located 'file' 9029 after probing 475 files Initial step size: 130 located 'file' 9029 after probing 194 files Initial step size: 140 located 'file' 9029 after probing 359 files Initial step size: 150 located 'file' 9029 after probing 95 files Initial step size: 160 located 'file' 9029 after probing 257 files Initial step size: 170 located 'file' 9029 after probing 47 files Initial step size: 180 located 'file' 9029 after probing 168 files Initial step size: 190 located 'file' 9029 after probing 275 files Initial step size: 200 located 'file' 9029 after probing 95 files Initial step size: 210 located 'file' 9029 after probing 223 files Initial step size: 220 located 'file' 9029 after probing 75 files Initial step size: 230 located 'file' 9029 after probing 151 files Initial step size: 240 located 'file' 9029 after probing 257 files Initial step size: 250 located 'file' 9029 after probing 95 files Initial step size: 260 located 'file' 9029 after probing 215 files Initial step size: 270 located 'file' 9029 after probing 91 files Initial step size: 280 located 'file' 9029 after probing 155 files Initial step size: 290 located 'file' 9029 after probing 234 files Initial step size: 300 located 'file' 9029 after probing 95 files Initial step size: 310 located 'file' 9029 after probing 175 files Initial step size: 320 located 'file' 9029 after probing 104 files Initial step size: 330 located 'file' 9029 after probing 144 files Initial step size: 340 located 'file' 9029 after probing 215 files Initial step size: 350 located 'file' 9029 after probing 103 files Initial step size: 360 located 'file' 9029 after probing 197 files Initial step size: 370 located 'file' 9029 after probing 120 files Initial step size: 380 located 'file' 9029 after probing 125 files Initial step size: 390 located 'file' 9029 after probing 89 files Initial step size: 400 located 'file' 9029 after probing 127 files Initial step size: 410 located 'file' 9029 after probing 75 files Initial step size: 420 located 'file' 9029 after probing 87 files Initial step size: 430 located 'file' 9029 after probing 152 files Initial step size: 440 located 'file' 9029 after probing 75 files Initial step size: 450 located 'file' 9029 after probing 119 files Initial step size: 460 located 'file' 9029 after probing 151 files Initial step size: 470 located 'file' 9029 after probing 125 files Initial step size: 480 located 'file' 9029 after probing 138 files Initial step size: 490 located 'file' 9029 after probing 101 files Initial step size: 500 located 'file' 9029 after probing 95 files Initial step size: 510 located 'file' 9029 after probing 159 files Initial step size: 520 located 'file' 9029 after probing 105 files Initial step size: 530 located 'file' 9029 after probing 155 files Initial step size: 540 located 'file' 9029 after probing 91 files Initial step size: 550 located 'file' 9029 after probing 127 files Initial step size: 560 located 'file' 9029 after probing 155 files Initial step size: 570 located 'file' 9029 after probing 175 files Initial step size: 580 located 'file' 9029 after probing 182 files Initial step size: 590 located 'file' 9029 after probing 190 files Initial step size: 600 located 'file' 9029 after probing 190 files Initial step size: 610 located 'file' 9029 after probing 187 files Initial step size: 620 located 'file' 9029 after probing 175 files Initial step size: 630 located 'file' 9029 after probing 166 files Initial step size: 640 located 'file' 9029 after probing 155 files Initial step size: 650 located 'file' 9029 after probing 127 files Initial step size: 660 located 'file' 9029 after probing 110 files Initial step size: 670 located 'file' 9029 after probing 177 files Initial step size: 680 located 'file' 9029 after probing 131 files Initial step size: 690 located 'file' 9029 after probing 188 files Initial step size: 700 located 'file' 9029 after probing 159 files Initial step size: 710 located 'file' 9029 after probing 128 files Initial step size: 720 located 'file' 9029 after probing 175 files Initial step size: 730 located 'file' 9029 after probing 139 files Initial step size: 740 located 'file' 9029 after probing 179 files Initial step size: 750 located 'file' 9029 after probing 171 files Initial step size: 760 located 'file' 9029 after probing 200 files Initial step size: 770 located 'file' 9029 after probing 200 files Initial step size: 780 located 'file' 9029 after probing 151 files Initial step size: 790 located 'file' 9029 after probing 173 files Initial step size: 800 located 'file' 9029 after probing 191 files Initial step size: 810 located 'file' 9029 after probing 134 files Initial step size: 820 located 'file' 9029 after probing 145 files Initial step size: 830 located 'file' 9029 after probing 152 files Initial step size: 840 located 'file' 9029 after probing 155 files Initial step size: 850 located 'file' 9029 after probing 155 files Initial step size: 860 located 'file' 9029 after probing 154 files Initial step size: 870 located 'file' 9029 after probing 149 files Initial step size: 880 located 'file' 9029 after probing 205 files Initial step size: 890 located 'file' 9029 after probing 205 files Initial step size: 900 located 'file' 9029 after probing 191 files Initial step size: 910 located 'file' 9029 after probing 173 files Initial step size: 920 located 'file' 9029 after probing 213 files Initial step size: 930 located 'file' 9029 after probing 213 files Initial step size: 940 located 'file' 9029 after probing 186 files Initial step size: 950 located 'file' 9029 after probing 215 files Initial step size: 960 located 'file' 9029 after probing 215 files Initial step size: 970 located 'file' 9029 after probing 179 files Initial step size: 980 located 'file' 9029 after probing 179 files Initial step size: 990 located 'file' 9029 after probing 197 files Initial step size: 1000 located 'file' 9029 after probing 209 files

The reason for making this a meditation (besides that McDarren had his answer and I'd done this work :), is I wonder if:

  1. This generally useful enough to be worth developing?
  2. Does anyone know of (and have a pointer to), any prior art that is similar, or tackles a similar problem?
  3. Can anyone suggest a way of 'guessing' a good initial step size? Or an approach for determining the same?

The test code:

#! perl -slw use strict; package Tie::Array::CountReads; use Tie::Array; our @ISA = 'Tie::Array'; my %readCounts; sub TIEARRAY{ my $self = bless [], $_[ 0 ]; $readCounts{ $self } = 0; return $self; } sub FETCH { ++$readCounts{ $_[0] }; $_[0]->[ $_[1] ] } sub FETCHSIZE{ scalar @{ $_[0] } } sub STORESIZE{ $#{ $_[0] } = $_[1] } sub STORE{ $_[0]->[ $_[ 1 ] ] = $_[ 2 ] } sub readCount{ $readCounts{ $_[ 0 ] } } sub resetCount{ $readCounts{ $_[0] } = 0 } package main; my $ramping = 57000 - 9029; my $step = int( $ramping / 72 ); my $leftovers = $ramping - ( $step * 72 ); my $tiedObj = tie my @levels, 'Tie::Array::CountReads'; @levels = ( ( 68 ) x 9029, map( (( $_ ) x $step ), reverse 0 .. 71 ), (0) x $leftovers ); #print scalar @levels; sub probe { use integer; my( $aref, $lo, $hi, $limit, $istep ) = @_; my $step = ( $hi - $lo ) / $istep; { my $i = $lo; $i += $step while $i <= $hi and $aref->[ $i ] <= $limit; if( $i >= $hi ) { if( $aref->[ $hi ] == $limit ) { --$hi while $aref->[ $hi - 1 ] == $limit; return $hi; } $step /= 2; } else { $limit = $aref->[ $i ]; $lo = $hi - $step; ## Update: Swapped the order of this li +ne and the next $hi = $i; $step = int( ( $hi - $lo ) / $istep ); # $istep was wrongl +y '10' } redo; } } for my $istep ( 10 .. 100 ) { printf "Initial step size: $istep located 'file' %d after probing +%d files\n", probe( \@levels, 0, $#levels, 68, $istep ), $tiedObj->readCount; $tiedObj->resetCount; } for my $istep ( map $_ *10, 10 .. 100 ) { printf "Initial step size: $istep located 'file' %d after probing +%d files\n", probe( \@levels, 0, $#levels, 68, $istep ), $tiedObj->readCount; $tiedObj->resetCount; }

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to A different kind of search algorithm? by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.