Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!!

by Mabooka-Mabooka (Sexton)
on Sep 05, 2005 at 20:26 UTC ( #489285=note: print w/replies, xml ) Need Help??


in reply to “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!!

Gentlemen,

1st of all - huge thank you to everybody. It’s as always a pleasure to be here and have your questions answered. I *love* this community of wise, intelligent and professional hackers:-), (-- although I am not too much into Perl:-)- (sorry))..

A few general remarks:
1. I am sorry if my initial post caused a bit of frustration for some of you. It wasn’t intended to be. My point was:
I expect a higher quality of answers in FAQ than in general posts. And the more F is a Q, the higher quality the A (I assume) should be. That’s all.
It’s just my opinion of course, but I think many people treat “FAQ” as the only source of truth...

2. Thanks a lot for the two _binary search_ answers. Both seem to work, I’ll do some corner-case and performance testing and compare them to hash implementation as well, -- and probably post the results here (if nobody minds).
2-a). I’ve made two changes in find_int_in_array :
A)

#my ( $arref, $targ ) = @_; # args are array ref, int value my ( $targ, $arref ) = @_; # args are: (int, array ref)
this is just a “style” thing, easier to remember (int in arr, not arr in int), plus - easier to test together with other impl-s by the same driver;
B)
Changed
my $nextidx = $asize / 2; my $nextinc = $nextidx / 2;

to:
my $nextidx = int($asize / 2); my $nextinc = int($nextidx / 2);
(Otherwise, for an array of 5 elements, it’d return an index = 2.5 ;-...)).

3. Re: using <code> v.s <pre>:
I do know the rules and tried not to, but,,, on my screen, it’s either <code> is too small (cannot read), or with the font size increase, everything else’s too big and bold..... Maybe smb. can review this policy? (Minor thing of course).

4. ($#array + 1) vs. scalar(@array) :
You’re going to laugh, but I did a performance test..:). The results is: $#array is ~10-15% faster than the other one. Intuitively, that’s what one would expect (my wild guess is that $#array is probably an internal counter that is always kept up-to-date, and the scalar(@array) gets calculated).
....This is a pure academical question of course:-)... It’s hard to imagine an application that would suffer from using the 2nd approach.

5. Using a hash vs. Binary search.
Again, hash works just fine, except for three implications:

  • memory;
  • slows down for huge amounts of data (vs. bisearch which is always linear);
  • complex:-)

Enough said about ## 1 & 2; let me explain the 3d one:
... Imagine an application that updates an array rarely and looks up often. Say you have 10,000,000 customers / books / whatever that get added / published only once a minute, but information is requested 1000 per second.
So, re-building the hash for lookup 1000 times a second is clearly not an option. Right?
A solution to that might be (I’ll think in some pseudo-language now):
class Array_with_Hash: @the_array = () % the_hash = (); init(@a) { put (a =>> the_array); recreate_hash (the_array, the_hash); } pop(elem): { array.pop(el) update_hash(el); } ...

Well, if you know for sure that push and pop is *all* that you do, that might be possible. However, creating a generic class like this might be really tricky.
  • array must be locked (no access to it directly, -- only by this class’s methods;
  • hash values probably have to be arrays_of_indexes...
  • implementing sort(), slice-and-dice etc. might take some time;
  • it’s quite easy to “forget” about something...
  • ...

This only means that the hash approach only works if
  • cost(recreating the hash every time) == nothing
  • OR
  • cost(programming time) == nothing..:)
  • .
In other words, -- thanks again for the binary search in Perl! (And it should be in FAQ, too...:-)...

Replies are listed 'Best First'.
Re^2: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!!
by graff (Chancellor) on Sep 06, 2006 at 06:46 UTC
    On another topic...
    Imagine an application that updates an array rarely and looks up often. Say you have 10,000,000 customers / books / whatever that get added / published only once a minute, but information is requested 1000 per second. So, re-building the hash for lookup 1000 times a second is clearly not an option. Right?

    Um, if the application is as you describe, why would you want to use an array at all? With that sort of ratio between updates and searches, it would be better just to maintain a hash instead of an array. Surely you do not want to "recreate the hash every time" in order to implement the search; create the hash once and maintain it, assuming it fits in memory -- and if not, consider a DBM_File approach (cf. AnyDBM_File), or just use a real database backend.

    If your issue is "using a (temporary) hash to store a copy of an array so that the array can be searched for specific values", yes, that's a bad approach for really big arrays and the kind of update/search ratio you're suggesting. But you're not explaining why the primary data storage needs to be an array.

    If the issue is really "coming up with a viable app to support lots of searches for specific values in a large set", the answer is more likely to be: start by using a hash as the primary storage, and use the values to be searched for as hash keys.

Re^2: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!!
by graff (Chancellor) on Sep 06, 2006 at 06:19 UTC
    Changed
    my $nextidx = $asize / 2; my $nextinc = $nextidx / 2;
    to:
    my $nextidx = int($asize / 2); my $nextinc = int($nextidx / 2);
    (Otherwise, for an array of 5 elements, it’d return an index = 2.5 ;-...)).

    Actually, that change is unnecessary. Whenever a floating point value is used as an array index, Perl automatically uses just the integer part of the value, since that would be the only sensible thing to do.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://489285]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2023-01-29 02:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?