Array or Hash

palanisamy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Array or Hash by ikegami (Patriarch) on Jan 21, 2011 at 06:59 UTC
At doing what? Both scale the same for insertions (amortised O(1)). Both scale the same for fetches (amortised O(1)). A fetch should take a little longer with a hash, but I suspect this wouldn't be the deciding factor in most cases. You can really only delete from the start and from the end of an array — anything else can be expensive since it requires shifting the elements — but deleting from a hash should be cheap. For tiny hashes or arrays, this shouldn't be significant. You can really only insert at the start and the end of an array — anything else can be expensive since it requires shifting the elements — but inserting into a hash should be cheap. For tiny hashes or arrays, this shouldn't be significant. In short, I doubt you'll notice a difference.	[reply]
Re^2: Array or Hash by BrowserUk (Patriarch) on Jan 21, 2011 at 17:26 UTC
Both scale the same for insertions (amortised O(1)) Care to demonstrate how insertions into an array amortise to O(1)? Both scale the same for fetches (amortised O(1)). A fetch should take a little longer with a hash, but I suspect this wouldn't be the deciding factor in most cases. You wouldn't notice if your programs ran 5 times slower? `perl -MTime::HiRes=time -E"@a=1..1e6;$t=time; ++$a[$_] for 1..1e6; printf qq[%f\n], time()-$t" 0.146912 perl -MTime::HiRes=time -E"%h=map{($_)x2}1..1e6;$t=time; ++$h{$_} for 1..1e6; printf qq[%f\n], + time()-$t" 0.525000` [download] Or an order of magnitude more slowly? `perl -MTime::HiRes=time -E"$t=time;my @a; $a[$_]=1 for 1..1e6; printf qq[%f\n], time()-$t" 0.220181 perl -MTime::HiRes=time -E"$t=time;my %h; $h{$_}=1 for 1..1e6; printf qq[%f\n], time()-$t" 1.805000` [download] Concealing reality behind grand sounding theoretical BS is all the worse as you obviously know better. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^3: Array or Hash by ikegami (Patriarch) on Jan 21, 2011 at 18:35 UTC
Care to demonstrate how insertions into an array amortise to O(1)? Oops, that's limited to insertions at the start or end. Concealing reality behind grand sounding theoretical BS is all the worse as you obviously know better. You have it backwards. I avoided the theoretical BS, sticking to real world cases. You wouldn't notice if your programs ran 5 times slower? Some yes, some no, but what's that got to do with my comment? You didn't demonstrate any correlation between data structure access time and run time of my programs, much less 1:1. as you obviously know better. Enough with the lies and personal attacks.	[reply]
Re^4: Array or Hash by BrowserUk (Patriarch) on Jan 21, 2011 at 18:51 UTC
Re^5: Array or Hash by ikegami (Patriarch) on Jan 21, 2011 at 19:24 UTC
Some notes below your chosen depth have not been shown here
Re: Array or Hash by Ratazong (Monsignor) on Jan 21, 2011 at 06:59 UTC
It depends on your use-case which one is faster If you want to loop through all elements, arrays are faster If you want to look up a specific element (where you know the index), arrays are faster If you want to know if a specific element is in a list, hashes are much faster There are many posts on this, e.g. the discussion in this thread Why are hashes so much faster?, or the mailing-list entry here. HTH, Rata	[reply]
Re: Array or Hash by BrowserUk (Patriarch) on Jan 21, 2011 at 06:49 UTC
Arrays are faster to access because the supplied index number is used directly to calculate the memory address of the value. Whereas with a hash, the key must be converted (hashed) to a number before the address calculation of the value can be performed.	[reply]
Re^2: Array or Hash by chrestomanci (Priest) on Jan 21, 2011 at 09:29 UTC
Are you sure about that? Unlike C, perl can do sparse arrays without using lots of memory for the empty slots, so you can write: `my @test = (); $test[ 0 ] = 'first'; $test[ 10_000_000 ] = 'last';` [download] If the array index was used to directly calculate the RAM location of the data, then perl would have to allocate lots of it for the table. The fact that the code above works suggests that perl keeps a lookup table somewhere of valid values. That lookup table will be similar to the one needed for a hash, and potentially no faster.	[reply] [d/l]
Re^3: Array or Hash by BrowserUk (Patriarch) on Jan 21, 2011 at 09:47 UTC
Are you sure about that? Yes. I am very sure. Unlike C, perl can do sparse arrays without using lots of memory for the empty slots, No. It cannot. The second assignment in your example (on a 64-bit system) results is the allocation of 134 megabytes of ram: `$a[0]=1; print total_size( \@a );; 232 $a[1]=1; print total_size( \@a );; 256 $a[2]=1; print total_size( \@a );; 280 $a[10_000_000]=1; print total_size( \@a );; 134217984` [download] To gain a better appreciation for how Perl uses memory, see PerlGuts Illustrated and scan down to the AV heading. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Array or Hash by chrestomanci (Priest) on Jan 21, 2011 at 16:14 UTC
Re^5: Array or Hash by BrowserUk (Patriarch) on Jan 21, 2011 at 16:54 UTC
Re^4: Array or Hash by ahmad (Hermit) on Jan 21, 2011 at 15:41 UTC
Re^5: Array or Hash by BrowserUk (Patriarch) on Jan 21, 2011 at 15:43 UTC
Re^3: Array or Hash by ikegami (Patriarch) on Jan 21, 2011 at 09:46 UTC
That's not a sparse array. It has 10,000,001 allocated slots, all of which are in use. You make "keeping track of valid values" sound very complicated but it's just 3 numbers. `>perl -MDevel::Peek -e"$a[6]=1; $a[7]=1; shift(@a); Dump(\@a,1)" SV = IV(0x1acf60) at 0x1acf68 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x336af0 SV = PVAV(0x1adc30) at 0x336af0 REFCNT = 2 FLAGS = () ARRAY = 0x330590 (offset=1) -- Address of first element ALLOC = 0x330588 FILL = 6 -- Index of last element MAX = 12 -- Index of last allocated element ARYLEN = 0x0 FLAGS = (REAL)` [download] The array has 7 elements and 14 allocated elements (-1..12 relative to ARRAY). Update: Replaced description with an example.	[reply] [d/l]
Re: Array or Hash by dHarry (Abbot) on Jan 21, 2011 at 06:49 UTC
What do you what to do with them? What is the intended use? You could of course do a proof of concept, i.e. implement both and benchmark.	[reply]
Re: Array or Hash by JavaFan (Canon) on Jan 21, 2011 at 11:30 UTC
Array or Hash which is faster ? Why do you care? One stores different data in a hash than in an array. They are both keyed data, but for an array, the key is a number (either used as a cardinal or an ordinal). For a hash, the key is a string.	[reply]
Re: Array or Hash by locked_user sundialsvc4 (Abbot) on Jan 21, 2011 at 18:15 UTC
The underlying implementation is opaque, and you should always expect it to be. Furthermore, you are basically barking up the wrong tree here. In either case, the variable is an access-method that, given a “key,” conjures up a “value” from some place in memory. No matter how that mapping is done, that won’t be where any “speed loss” comes from. Loss of speed in memory-access can only come from one source: page faults. Perl’s implementors are already acutely aware of this, and have cleverly designed their systems with this in mind. But, you must also be aware of this fundamental issue with regard to how you use their two nifty inventions. Design your algorithms so that the working-set is of a reasonable size, and remains comparatively stable across time.
Re^2: Array or Hash by BrowserUk (Patriarch) on Jan 21, 2011 at 18:33 UTC
Loss of speed in memory-access can only come from one source: page faults. Utter drivel.	[reply]
Re^2: Array or Hash by ikegami (Patriarch) on Jan 21, 2011 at 19:06 UTC
The underlying implementation is opaque, and you should always expect it to be. Not really. "Hash", short for "hash table", is a specific algorithm. "Associative array" is the implementation-independent term. Loss of speed in memory-access can only come from one source: page faults. I think you mean cache misses. p5p is indeed aware that cache misses can mess up theoretical performance analysis, but it's hardly the only source of loss of speed. I'm not well versed in this area, but I suspect that it's quite the opposite, that it takes second seat to other macro factors (like quantum physics to newtonian physics). http://queue.acm.org/detail.cfm?id=1814327 is a story where changing the algorithm to reduce cache misses made a big difference. If what you said was true, a linear search would be as fast as the algorithm to which he changed since they're both cache-oblivious.	[reply]