Re: find closest element of array without going over
by pjotrik (Friar) on Jun 26, 2008 at 17:26 UTC
|
| [reply] |
Re: find closest element of array without going over
by starbolin (Hermit) on Jun 26, 2008 at 17:48 UTC
|
"...which just seems wasteful," Not as inefficient as you might think. Perl, and the modern processor, is highly optimized for this case. Does your application really require optimal performance here? I think you'd be really hard pressed to code something that was significantly faster. Consider that the whole thing, minus the array, could easily fit into one of the processor's ALU pipelines. The only thing that is going to slow this down is if the array exceeds the L2 cache. That would be how many millions of integers on your machine?
For those whose advocate a binary search, I think that if you were to code it you'd find very little improvement. ( Been there, done that, have the shirt. ) Considering the difficulty in coding a binary search correctly, the likelihood in having bugs therein, and the programmer effort required I would have a hard time recommending such.
s/millions/hundred thousand/
s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s
|-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,,
$|=1,select$,,$,,$,,1e-1;print;redo}
| [reply] |
|
I took that as a challenge ;-). I didn't test my binary search with multiple identical values, thus I don't count it as a full implementation. But see the speed differences as a function of the array size:
100 items
Rate brutal binary
brutal 186657/s -- -16%
binary 223418/s 20% --
1000 items
Rate brutal binary
brutal 20574/s -- -87%
binary 156038/s 658% --
10000 items
Rate brutal binary
brutal 2167/s -- -98%
binary 129102/s 5859% --
100000 items
Rate brutal binary
brutal 169/s -- -100%
binary 91542/s 54205% --
And this is the code I used:
The index of $limit / 3 to search for actually favors the "brutal" search, since it means the search terminates earlier than in random conditions. | [reply] [d/l] [select] |
|
It's all a matter of perspective. In the OP's query, he had 18 items in his list. Taking your code, changing the $search_for to be $limit/2 to better approximate actual random values, and using an input of 17 (since you seem to add one), I get:
$ perl ./x.pl 17
Done generating list
9 9
Rate brutal binary
brutal 449367/s -- -17%
binary 544651/s 21% --
So the difference is ~20%. But what are we really talking about here? On my system, that difference is really dropping from a binary search's 2.23 µs to 1.84 µs. A savings of 0.39 µs. Seriously, we're quibbling here? (Binary search should do REALLY well here, too, picking nearly the first item, I think)
In your cases, I see at 100 (101, really, but again, let's not quibble about small details) dropping from 5.4 µs to 4.5 µs, a savings of 0.88 µseconds. At 1000, it's 48.6 µs vs 6.41 µs, or 42.2 µs savings. Much bigger numbers. But, seriously, is it a concern?
At 10,000, it's 461 µs vs 7.75 µs, or a savings of 454 µs. We're still under a thousandth of a second here, folks. Even at 100,000 items in the (pre-)sorted list, we're comparing 5.92 ms vs 10.9 µs. Sure, that's a savings of nearly the whole thing, but that's still only saving 5.91 ms. Really, do we care?
Now, granted, you ran these tests skewed (looking for something 33% in instead of half-way, otherwise binary search wouldn't even have to do anything), but the reality is (and I think starbolin's point is) that doing this search really doesn't take much time. Worrying about this before you've profiled anything is simply premature optimisation, and you'd get bigger bang for your buck (that being programmer's time) if you'd spend it on something else. The reward for all the time spent coding, testing, and fixing a binary search just does not pay for itself, at least not if you're merely running it once, for a small number (i.e., less than a million, it seems).
Changing $search_for to my $search_for = $a[int($limit * (1/2 - 1/9))];, and rerunning at 1_000_000 entries shows a savings of about 66.7ms on my machine. It's simply not worth it. Running one more time for 10 million, and the savings is 775 ms (from 775ms minus 14.1 µs). Now it's starting to be worth it. If it's being run interactively. If it's a long-running program that users expect to walk away from and come back, then even that's no big deal.
Of course, this may all be moot if it's code for perlmonks or slashdot or some other active site, but you still should profile before worrying about such small improvements. Chances are, your bottlenecks still are elsewhere.
Update: I should point out that I realise that bunchmarking a static index for an O(n) vs O(lg n) comparison is inherently unfair, and an equal distribution across the entire problem space would need to be concocted for truely accurate numbers. However, even if that resulted in numbers two or three times what I provide above (which would not be the case since the upper limit is on the brute-force method, and that would not increase by much), that would not change the conclusion. 18ms is not much more time than 6ms (and the true number would probably be still under 8ms at 100,000 on this machine). | [reply] [d/l] [select] |
|
|
|
Except I see at least five errors in your code.
- First, your code finds the next element 'greater' than the input not 'ge'. Also it returns the index not the value ( See 'Fourth' below )
- Second,
my $tmp = ($right+$left) / 2;
if ($a[$tmp] > $search_for){ ...
Very interesting indexing into an array with a float. I know perl will DWIM but although perl will round toward 0, I don't think it's guaranteed.
- Third, $right+$left will overflow silently for numbers greater than 1e31 on many systems. This is a rare occurrence in these days of 32 bit int but it's there nevertheless.
- Fourth, your code returns an empty list element if $search_for is at the end of the array. While this my be a usefull error return, should the programmer attempt to use the index to retrieve a value from the array, the program would break.
- Fifth, in your example you define your array to be one larger than what you search over. While this is more an implementation problem than an algorithm problem the example is nonetheless in error.
I'm being picky here because the code illustrates that a quick-search is difficult to write correctly. Your code makes my point perfectly, thank you.
Update: Three is a Red Herring. When first I looked at your code I said "Ah, there is the Binary-Search-Overflow bug!" But of course this was a bigger problem when long was the memory space and int was shorter than long. But Perl is written so that an int can access the total memory space and so three is less of a bug that any other operation on large integers would be.
s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s
|-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,,
$|=1,select$,,$,,$,,1e-1;print;redo}
| [reply] [d/l] |
Re: find closest element of array without going over
by psini (Deacon) on Jun 26, 2008 at 17:27 UTC
|
| [reply] |
Re: find closest element of array without going over
by moritz (Cardinal) on Jun 26, 2008 at 17:29 UTC
|
If your array is always sorted, you can use a binary search to look for a value. That way you have only O(log(scalar @array)) lookup operations.
Basically you start in the middle of the array, compare your search value with the middle element, and if larger, search in the second half of the array. If smaller, search in the first half of the array. | [reply] [d/l] |
Re: find closest element of array without going over
by NetWallah (Canon) on Jun 26, 2008 at 18:30 UTC
|
If the numbers in your @array have relatively small values (say less than 10**6), and are relatively close to each other, you could trade memory for speed thus:
my ($i, @a2);
$a2[ $array[$_] ]=$_ for 0..$#array;
for ($i = $find; $i >0; $i--) {
last if defined $a2[$i];
}
print "Method 2: $i \n";
Have you been high today? I see the nuns are gay! My brother yelled to me...I love you inside Ed - Benny Lava, by Buffalax
| [reply] [d/l] |
Re: find closest element of array without going over
by kyle (Abbot) on Jun 26, 2008 at 18:32 UTC
|
Others have mentioned binary search in case your array is sorted. If your array is not sorted, you'll have to scan over the whole thing. Here's a short way to do this that makes the intent obvious:
use List::Util 'max';
my $selected = max grep { $find >= $_ } @array;
That's somewhat inefficient, however, because it actually loops twice (once for the grep and again over whatever grep produces).
You could write it in one loop this way:
# Find the largest number in @array
# that is <= $find and put it in $selected
my $selected;
foreach my $number ( @array ) {
$selected = $number if $find >= $number
and $number > $selected;
last if defined $selected and $selected == $find;
}
If you're not going to have a huge list, there's little downside to doing it the easy way. (In fact, for short lists, I wouldn't be surprised if the grep method is faster.) If you do decide to write your own loop, be sure to comment it, and maybe put it off in a sub with a good name. | [reply] [d/l] [select] |
Re: find closest element of array without going over ( O(1) )
by BrowserUk (Patriarch) on Jun 27, 2008 at 06:52 UTC
|
#! perl -slw
use strict;
use List::Util qw[ reduce ]; $a = $b;
my @array = (1,4,5,6,7,9,10,23,34,44,55,56,57,59,70,80,90,100);
my $yardstick = '';
reduce {
my $n = $a;
$yardstick .= chr( $a ), $n++ while $n < $b;
$b;
} @array;
print "$_ : ", ord( substr $yardstick, $_-1, 1 ) for map 10 * $_- 5, 1
+ .. 10;
__END__
c:\test>junk8
5 : 5
15 : 10
25 : 23
35 : 34
45 : 44
55 : 55
65 : 59
75 : 70
85 : 80
95 : 90
This can fairly easily be adapted to handle numbers upto 64k if you have 128k of ram to spare. Larger number ranges obviously require more ram. Negative numbers can also be handled with minor adaptions.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: find closest element of array without going over
by leocharre (Priest) on Jun 26, 2008 at 18:39 UTC
|
Faster how?
For you or for the computer?
If what you want is for the computer to be faster, do this in c.
If you are thinking of expense of coder/human time, vs the expense of extra hardware.. maybe it's fine the way it is.
I read some amazing stuff by Paul Graham about this kinda thing.. At least that's one of the things covered.. about convenience vs "speed".
Now, before i am stoned to death for seeming off topic here.. I think this does apply. He said so right there.. it works.
I think perl is indeed a language about the coder, about convenience and what feels good to the individual coder. It's so promiscuously adaptive and optionally forgiving. I get the idea it is indeed for making it easier for the human being. As opposed to .. say.. assembly or just plain ol' cards.
| [reply] |
Re: find closest element of array without going over
by starbolin (Hermit) on Jun 27, 2008 at 05:57 UTC
|
sadfaceman writes: "....if I am going to repeatably perform this function." I don't know if any of us saw that little phrase on the first read through. I missed it totally yet it could be important. If you are going to repeatedly search the same list then storing the list as Binary Tree would be a good optimization. You only build in once and you get the benefit every time you use it. The tradeoff is you have to write methods for creating the tree and traversing the tree.
NetWallah's method is similar in that he builds a sparse representation of the data to allow easy indexing. A Binary Tree is similar in method but the storage is more compact.
Update: Creating a binary tree has the added advantage of sorting the key array as you build the tree.
s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s
|-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,,
$|=1,select$,,$,,$,,1e-1;print;redo}
| [reply] [d/l] |
Re: find closest element of array without going over
by jds17 (Pilgrim) on Jun 26, 2008 at 18:40 UTC
|
Hi, I am afraid I am reinventing the wheel here, but I had fun writing it, so here it goes.
Comments:
1. It's just a binary search, as suggested by everyone else!
2. The performance should be quite good, but I did not test it against big arrays (and I would obviously need to write something to test against...)
3. In case two numbers in the array are equally close, the bigger number will be chosen as the winner.
Update: Of course I have to assume the target array is sorted.
use strict;
use warnings;
my @array = (1,4,5,6,7,9,10,23,34,44,55,56,57,59,70,80,90,100);
print "\nenter number\n";
chomp (my $find = <STDIN>);
my $nearest = @{nearest(\@array)}[0];
print "nearest to $find in array is: $nearest\n";
sub nearest {
my ($a) = @_;
my $size = @$a;
return $a if $size == 1;
my $mid = int(($size-1) / 2);
my $test = @$a[$mid];
return $test <= $find ?
(abs($test-$find)<abs(@$a[$mid+1]-$find) ? [$test] :
$find <= @$a[$mid+1] ? [@$a[$mid+1]] : nearest([@$a[$mid+1 .. $
+size-1]]))
:
(abs($test-$find)<abs(@$a[$mid-1]-$find) ? [$test] :
$find >= @$a[$mid-1] ? [@$a[$mid-1]] : nearest([@$a[0 .. $mid]]
+));
}
| [reply] [d/l] |
Re: find closest element of array without going over
by rir (Vicar) on Jun 27, 2008 at 19:42 UTC
|
and it works
It seems incorrect to print 100 when the input is 0. Beware
backing around the corner of Perl's arrays.
Be well,
rir | [reply] |