Re: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!!

in reply to “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!!

Gentlemen,

1st of all - huge thank you to everybody. It’s as always a pleasure to be here and have your questions answered. I *love* this community of wise, intelligent and professional hackers:-), (-- although I am not too much into Perl:-)- (sorry))..

A few general remarks:
1. I am sorry if my initial post caused a bit of frustration for some of you. It wasn’t intended to be. My point was:
I expect a higher quality of answers in FAQ than in general posts. And the more F is a Q, the higher quality the A (I assume) should be. That’s all.
It’s just my opinion of course, but I think many people treat “FAQ” as the only source of truth...

2. Thanks a lot for the two _binary search_ answers. Both seem to work, I’ll do some corner-case and performance testing and compare them to hash implementation as well, -- and probably post the results here (if nobody minds).
2-a). I’ve made two changes in find_int_in_array :
A)

    
 #my ( $arref, $targ ) = @_; # args are array ref, int value
  my ( $targ, $arref ) = @_; # args are: (int, array ref)
[download]

this is just a “style” thing, easier to remember (int in arr, not arr in int), plus - easier to test together with other impl-s by the same driver;
B)
Changed

    my $nextidx = $asize / 2;
    my $nextinc = $nextidx / 2;
[download]

to:

    my $nextidx = int($asize / 2);
    my $nextinc = int($nextidx / 2);
[download]

(Otherwise, for an array of 5 elements, it’d return an index = 2.5 ;-...)).

3. Re: using <code> v.s <pre>:
I do know the rules and tried not to, but,,, on my screen, it’s either <code> is too small (cannot read), or with the font size increase, everything else’s too big and bold..... Maybe smb. can review this policy? (Minor thing of course).

4. ($#array + 1) vs. scalar(@array) :
You’re going to laugh, but I did a performance test..:). The results is: $#array is ~10-15% faster than the other one. Intuitively, that’s what one would expect (my wild guess is that $#array is probably an internal counter that is always kept up-to-date, and the scalar(@array) gets calculated).
....This is a pure academical question of course:-)... It’s hard to imagine an application that would suffer from using the 2nd approach.

5. Using a hash vs. Binary search.
Again, hash works just fine, except for three implications:

memory;
slows down for huge amounts of data (vs. bisearch which is always linear);
complex:-)

Enough said about ## 1 & 2; let me explain the 3d one:
... Imagine an application that updates an array rarely and looks up often. Say you have 10,000,000 customers / books / whatever that get added / published only once a minute, but information is requested 1000 per second.
So, re-building the hash for lookup 1000 times a second is clearly not an option. Right?
A solution to that might be (I’ll think in some pseudo-language now):

class Array_with_Hash:

   @the_array = ()
   % the_hash = ();
   init(@a)
 {
    put (a =>> the_array);
   recreate_hash (the_array, the_hash);
 }

  pop(elem):
 {
     array.pop(el)
     update_hash(el);
 }

...
[download]

Well, if you know for sure that push and pop is *all* that you do, that might be possible. However, creating a generic class like this might be really tricky.

array must be locked (no access to it directly, -- only by this class’s methods;
hash values probably have to be arrays_of_indexes...
implementing sort(), slice-and-dice etc. might take some time;
it’s quite easy to “forget” about something...
...

This only means that the hash approach only works if

cost(recreating the hash every time) == nothing
OR
cost(programming time) == nothing..:)

In other words, -- thanks again for the binary search in Perl! (And it should be in FAQ, too...:-)...

In Section Seekers of Perl Wisdom