in reply to Slow at sorting?

I have been thinking about the pros and cons of using hash or array.

So I made myself some data with a script approximating something like orbitals example, using the rand()-funktion on CD#, pdf-name, unknown digits in parenthesis and the different strings in brackets.
I know it's not the real thing but for what I trying to find out it seems OK to me.

The two approaches I've tested is either using a hash, straightforward and easy on the programmer, see my previous post. Or using an array and tucking the filepos to the right side, see petrals post.

The tests were run on a measly Pentium 133 with 64MB RAM and the results are as of the table below(YMWV).

The empty cells indicate heavy use of virtual memory.
Impatience being a virtue I only gave it double the expected time before I hit Ctrl-C.

kLines Filesize Hash Array
  MB sec sec
100 6,16 11 12
150 9,24 16 18
175 10,78 19 22
190 11,704 22 24
200 12,32 23 25
210 12,936 24 26
220 13,552 25 27
230 14,168 62 29
250 15,4   31
300 18,48   37
350 21,56   44
400 24,64   107
500 30,8    

My interpretation is that if you got the memory then the hash method is slightly faster, but using the array method will take you about twice as far in term of possibel file sizes.

Thinking about it, it seems rather logical considering that the hash is both key and value whilst array is value only.

It's also nice to see that both variations behave linearly with increasing volume until VM sets in, just as it's written in A Fresh Look at Efficient Perl Sorting by Uri Guttman and Larry Rosler.