find closest element of array without going over

sadfaceman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: find closest element of array without going over by pjotrik (Friar) on Jun 26, 2008 at 17:26 UTC
If the array is sorted, you can perform a binary search	[reply]
Re: find closest element of array without going over by starbolin (Hermit) on Jun 26, 2008 at 17:48 UTC
"...which just seems wasteful," Not as inefficient as you might think. Perl, and the modern processor, is highly optimized for this case. Does your application really require optimal performance here? I think you'd be really hard pressed to code something that was significantly faster. Consider that the whole thing, minus the array, could easily fit into one of the processor's ALU pipelines. The only thing that is going to slow this down is if the array exceeds the L2 cache. That would be how many millions of integers on your machine? For those whose advocate a binary search, I think that if you were to code it you'd find very little improvement. ( Been there, done that, have the shirt. ) Considering the difficulty in coding a binary search correctly, the likelihood in having bugs therein, and the programmer effort required I would have a hard time recommending such. s/millions/hundred thousand/ s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply]
Re^2: find closest element of array without going over by moritz (Cardinal) on Jun 26, 2008 at 18:31 UTC
I took that as a challenge ;-). I didn't test my binary search with multiple identical values, thus I don't count it as a full implementation. But see the speed differences as a function of the array size: `100 items Rate brutal binary brutal 186657/s -- -16% binary 223418/s 20% -- 1000 items Rate brutal binary brutal 20574/s -- -87% binary 156038/s 658% -- 10000 items Rate brutal binary brutal 2167/s -- -98% binary 129102/s 5859% -- 100000 items Rate brutal binary brutal 169/s -- -100% binary 91542/s 54205% --` [download] And this is the code I used: Read more... (1106 Bytes) The index of `$limit / 3` to search for actually favors the "brutal" search, since it means the search terminates earlier than in random conditions.	[reply] [d/l] [select]
Re^3: find closest element of array without going over by Tanktalus (Canon) on Jun 26, 2008 at 19:32 UTC
It's all a matter of perspective. In the OP's query, he had 18 items in his list. Taking your code, changing the $search_for to be $limit/2 to better approximate actual random values, and using an input of 17 (since you seem to add one), I get: `$ perl ./x.pl 17 Done generating list 9 9 Rate brutal binary brutal 449367/s -- -17% binary 544651/s 21% --` [download] So the difference is ~20%. But what are we really talking about here? On my system, that difference is really dropping from a binary search's 2.23 µs to 1.84 µs. A savings of 0.39 µs. Seriously, we're quibbling here? (Binary search should do REALLY well here, too, picking nearly the first item, I think) In your cases, I see at 100 (101, really, but again, let's not quibble about small details) dropping from 5.4 µs to 4.5 µs, a savings of 0.88 µseconds. At 1000, it's 48.6 µs vs 6.41 µs, or 42.2 µs savings. Much bigger numbers. But, seriously, is it a concern? At 10,000, it's 461 µs vs 7.75 µs, or a savings of 454 µs. We're still under a thousandth of a second here, folks. Even at 100,000 items in the (pre-)sorted list, we're comparing 5.92 ms vs 10.9 µs. Sure, that's a savings of nearly the whole thing, but that's still only saving 5.91 ms. Really, do we care? Now, granted, you ran these tests skewed (looking for something 33% in instead of half-way, otherwise binary search wouldn't even have to do anything), but the reality is (and I think starbolin's point is) that doing this search really doesn't take much time. Worrying about this before you've profiled anything is simply premature optimisation, and you'd get bigger bang for your buck (that being programmer's time) if you'd spend it on something else. The reward for all the time spent coding, testing, and fixing a binary search just does not pay for itself, at least not if you're merely running it once, for a small number (i.e., less than a million, it seems). Changing $search_for to `my $search_for = $a[int($limit * (1/2 - 1/9))];`, and rerunning at 1_000_000 entries shows a savings of about 66.7ms on my machine. It's simply not worth it. Running one more time for 10 million, and the savings is 775 ms (from 775ms minus 14.1 µs). Now it's starting to be worth it. If it's being run interactively. If it's a long-running program that users expect to walk away from and come back, then even that's no big deal. Of course, this may all be moot if it's code for perlmonks or slashdot or some other active site, but you still should profile before worrying about such small improvements. Chances are, your bottlenecks still are elsewhere. Update: I should point out that I realise that bunchmarking a static index for an O(n) vs O(lg n) comparison is inherently unfair, and an equal distribution across the entire problem space would need to be concocted for truely accurate numbers. However, even if that resulted in numbers two or three times what I provide above (which would not be the case since the upper limit is on the brute-force method, and that would not increase by much), that would not change the conclusion. 18ms is not much more time than 6ms (and the true number would probably be still under 8ms at 100,000 on this machine).	[reply] [d/l] [select]
Re^4: find closest element of array without going over by moritz (Cardinal) on Jun 26, 2008 at 20:36 UTC
Re^5: find closest element of array without going over by Tanktalus (Canon) on Jun 26, 2008 at 21:03 UTC
Re^3: find closest element of array without going over by starbolin (Hermit) on Jun 27, 2008 at 00:24 UTC
Except I see at least five errors in your code. First, your code finds the next element 'greater' than the input not 'ge'. Also it returns the index not the value ( See 'Fourth' below ) Second, `my $tmp = ($right+$left) / 2; if ($a[$tmp] > $search_for){ ...` [download] Very interesting indexing into an array with a float. I know perl will DWIM but although perl will round toward 0, I don't think it's guaranteed. Third, $right+$left will overflow silently for numbers greater than 1e31 on many systems. This is a rare occurrence in these days of 32 bit int but it's there nevertheless. Fourth, your code returns an empty list element if $search_for is at the end of the array. While this my be a usefull error return, should the programmer attempt to use the index to retrieve a value from the array, the program would break. Fifth, in your example you define your array to be one larger than what you search over. While this is more an implementation problem than an algorithm problem the example is nonetheless in error. I'm being picky here because the code illustrates that a quick-search is difficult to write correctly. Your code makes my point perfectly, thank you. Update: Three is a Red Herring. When first I looked at your code I said "Ah, there is the Binary-Search-Overflow bug!" But of course this was a bigger problem when long was the memory space and int was shorter than long. But Perl is written so that an int can access the total memory space and so three is less of a bug that any other operation on large integers would be. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply] [d/l]
Re: find closest element of array without going over by psini (Deacon) on Jun 26, 2008 at 17:27 UTC
You can make a dichotomic search. About log₂n test instead of n/2. Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."	[reply]
Re: find closest element of array without going over by moritz (Cardinal) on Jun 26, 2008 at 17:29 UTC
If your array is always sorted, you can use a binary search to look for a value. That way you have only `O(log(scalar @array))` lookup operations. Basically you start in the middle of the array, compare your search value with the middle element, and if larger, search in the second half of the array. If smaller, search in the first half of the array.	[reply] [d/l]
Re: find closest element of array without going over by NetWallah (Canon) on Jun 26, 2008 at 18:30 UTC
If the numbers in your @array have relatively small values (say less than 10**6), and are relatively close to each other, you could trade memory for speed thus: `my ($i, @a2); $a2[ $array[$_] ]=$_ for 0..$#array; for ($i = $find; $i >0; $i--) { last if defined $a2[$i]; } print "Method 2: $i \n";` [download] Have you been high today? I see the nuns are gay! My brother yelled to me...I love you inside Ed - Benny Lava, by Buffalax	[reply] [d/l]
Re: find closest element of array without going over by kyle (Abbot) on Jun 26, 2008 at 18:32 UTC
Others have mentioned binary search in case your array is sorted. If your array is not sorted, you'll have to scan over the whole thing. Here's a short way to do this that makes the intent obvious: `use List::Util 'max'; my $selected = max grep { $find >= $_ } @array;` [download] That's somewhat inefficient, however, because it actually loops twice (once for the grep and again over whatever `grep` produces). You could write it in one loop this way: `# Find the largest number in @array # that is <= $find and put it in $selected my $selected; foreach my $number ( @array ) { $selected = $number if $find >= $number and $number > $selected; last if defined $selected and $selected == $find; }` [download] If you're not going to have a huge list, there's little downside to doing it the easy way. (In fact, for short lists, I wouldn't be surprised if the `grep` method is faster.) If you do decide to write your own loop, be sure to comment it, and maybe put it off in a sub with a good name.	[reply] [d/l] [select]
Re: find closest element of array without going over ( O(1) ) by BrowserUk (Patriarch) on Jun 27, 2008 at 06:52 UTC
Here's an O(1) solution for small (<256) numbers: `#! perl -slw use strict; use List::Util qw[ reduce ]; $a = $b; my @array = (1,4,5,6,7,9,10,23,34,44,55,56,57,59,70,80,90,100); my $yardstick = ''; reduce { my $n = $a; $yardstick .= chr( $a ), $n++ while $n < $b; $b; } @array; print "$_ : ", ord( substr $yardstick, $_-1, 1 ) for map 10 * $_- 5, 1 + .. 10; __END__ c:\test>junk8 5 : 5 15 : 10 25 : 23 35 : 34 45 : 44 55 : 55 65 : 59 75 : 70 85 : 80 95 : 90` [download] This can fairly easily be adapted to handle numbers upto 64k if you have 128k of ram to spare. Larger number ranges obviously require more ram. Negative numbers can also be handled with minor adaptions. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: find closest element of array without going over by leocharre (Priest) on Jun 26, 2008 at 18:39 UTC
Faster how? For you or for the computer? If what you want is for the computer to be faster, do this in c. If you are thinking of expense of coder/human time, vs the expense of extra hardware.. maybe it's fine the way it is. I read some amazing stuff by Paul Graham about this kinda thing.. At least that's one of the things covered.. about convenience vs "speed". Now, before i am stoned to death for seeming off topic here.. I think this does apply. He said so right there.. it works. I think perl is indeed a language about the coder, about convenience and what feels good to the individual coder. It's so promiscuously adaptive and optionally forgiving. I get the idea it is indeed for making it easier for the human being. As opposed to .. say.. assembly or just plain ol' cards.	[reply]
Re: find closest element of array without going over by starbolin (Hermit) on Jun 27, 2008 at 05:57 UTC
sadfaceman writes: "....if I am going to repeatably perform this function." I don't know if any of us saw that little phrase on the first read through. I missed it totally yet it could be important. If you are going to repeatedly search the same list then storing the list as Binary Tree would be a good optimization. You only build in once and you get the benefit every time you use it. The tradeoff is you have to write methods for creating the tree and traversing the tree. NetWallah's method is similar in that he builds a sparse representation of the data to allow easy indexing. A Binary Tree is similar in method but the storage is more compact. Update: Creating a binary tree has the added advantage of sorting the key array as you build the tree. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply] [d/l]
Re: find closest element of array without going over by jds17 (Pilgrim) on Jun 26, 2008 at 18:40 UTC
Hi, I am afraid I am reinventing the wheel here, but I had fun writing it, so here it goes. Comments: 1. It's just a binary search, as suggested by everyone else! 2. The performance should be quite good, but I did not test it against big arrays (and I would obviously need to write something to test against...) 3. In case two numbers in the array are equally close, the bigger number will be chosen as the winner. Update: Of course I have to assume the target array is sorted. use strict; use warnings; my @array = (1,4,5,6,7,9,10,23,34,44,55,56,57,59,70,80,90,100); print "\nenter number\n"; chomp (my $find = <STDIN>); my $nearest = @{nearest(\@array)}[0]; print "nearest to $find in array is: $nearest\n"; sub nearest { my ($a) = @_; my $size = @$a; return $a if $size == 1; my $mid = int(($size-1) / 2); my $test = @$a[$mid]; return $test <= $find ? (abs($test-$find)<abs(@$a[$mid+1]-$find) ? [$test] : $find <= @$a[$mid+1] ? [@$a[$mid+1]] : nearest([@$a[$mid+1 .. $ +size-1]])) : (abs($test-$find)<abs(@$a[$mid-1]-$find) ? [$test] : $find >= @$a[$mid-1] ? [@$a[$mid-1]] : nearest([@$a[0 .. $mid]] +)); } [download]	[reply] [d/l]
Re: find closest element of array without going over by rir (Vicar) on Jun 27, 2008 at 19:42 UTC
and it works It seems incorrect to print 100 when the input is 0. Beware backing around the corner of Perl's arrays. Be well, rir	[reply]