So I made myself some data with a script approximating something like orbitals example, using the rand()-funktion on CD#, pdf-name, unknown digits in parenthesis and the different strings in brackets.
I know it's not the real thing but for what I trying to find out it seems OK to me.
The two approaches I've tested is either using a hash, straightforward and easy on the programmer, see my previous post. Or using an array and tucking the filepos to the right side, see petrals post.
The tests were run on a measly Pentium 133 with 64MB RAM and the results are as of the table below(YMWV).
The empty cells indicate heavy use of virtual memory.
Impatience being a virtue I only gave it double the expected time before I hit Ctrl-C.
| kLines | Filesize | Hash | Array |
| MB | sec | sec | |
| 100 | 6,16 | 11 | 12 |
| 150 | 9,24 | 16 | 18 |
| 175 | 10,78 | 19 | 22 |
| 190 | 11,704 | 22 | 24 |
| 200 | 12,32 | 23 | 25 |
| 210 | 12,936 | 24 | 26 |
| 220 | 13,552 | 25 | 27 |
| 230 | 14,168 | 62 | 29 |
| 250 | 15,4 | 31 | |
| 300 | 18,48 | 37 | |
| 350 | 21,56 | 44 | |
| 400 | 24,64 | 107 | |
| 500 | 30,8 |
My interpretation is that if you got the memory then the hash method is slightly faster, but using the array method will take you about twice as far in term of possibel file sizes.
Thinking about it, it seems rather logical considering that the hash is both key and value whilst array is value only.
It's also nice to see that both variations behave linearly with increasing volume until VM sets in, just as it's written in A Fresh Look at Efficient Perl Sorting by Uri Guttman and Larry Rosler.
In reply to Re: Slow at sorting?
by guha
in thread Slow at sorting?
by orbital
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |