Your right Bart.
When I did my testing, for simplicity I used an array rather than a hash. Or rather 3 arrays.
This, so as to get a feel for average and worst case performance.
The numbers of sorts and comparisons I gave above were those for the worst case scenario where the 1_000_000 are pre-ordered ascending and every new value would require the keep array to be re-sorted. This is slow, as I indicated.
The bit I missed with your code was that as the values are coming from a hash, they are being processed in "hash order", which of course is undefined, but does a pretty good job of approximating random.
This means that the keep array only ends up getting sorted an average of around 900 times instead of the worse case of 999_000 times in the 100 from 1_000_000 case, with the obvious benefit on performance.
Whilst the worst case scenario still exists, the chances of it happening naturally are pretty astronomical and can therefore be discounted. A quick test using your code on 'AAAAA' .. 'BAAAA' for 913_952 keys used 908 sorts and took around 45 seconds on my 233MHz. This compares pretty well with my code on a 1_000_000 element array taking 15 seconds given that mine doesn't have the extra level of indirection in the comparitor that it would need to retain the keys.
I'd like to see a comparison of the two run on 10 million records as I don't think that they would scale linearly, but I agree that it makes the benefits of my algorithm look much less good. Leastwise while it is coded in pure perl.
An XS or C implementation of bsearch() would improve things dramatically. Maybe I'll have a go. I think that it would be a useful addition to List::Util. What d'ya think?
In reply to Re: Re: Re: Re: Re: *Fastest* way to print a hash sorted by value
by BrowserUk
in thread *Fastest* way to print a hash sorted by value
by smellysocks
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |