in reply to Re^23: Why is the execution order of subexpressions undefined?
in thread Why is the execution order of subexpressions undefined?
Here is a perl program that does a fair shuffle on a 1 million line file in 13 seconds.
#! perl -sw use strict; $| = 1; sub shuffle { my $ref = @_ == 1 ? $_[ 0 ] : \@_; my $n = @$ref; for( 0 .. $#$ref ) { my $p = $_ + rand( $n-- ); my $t = $ref->[ $p ]; $ref->[ $p ] = $ref->[ $_ ]; $ref->[ $_ ] = $t; } return unless defined wantarray; return wantarray ? @{ $ref } : $ref; } warn 'Start : '.localtime() .$/; my @data = <>; warn 'Read : '.localtime() .$/; shuffle \@data; warn 'Shuffled: '.localtime().$/; print for @data; warn 'Written : '.localtime().$/; __END__ P:\test>shuffleFile.pl data\1millionlines.dat >nul Start : Fri Apr 15 19:41:44 2005 Read : Fri Apr 15 19:41:53 2005 Shuffled: Fri Apr 15 19:41:55 2005 Written : Fri Apr 15 19:41:57 2005
Can Haskell do this? I believe the answer to be no, because the 1 million line file in my example was chosen because, when loaded, it is pushes the memory consumption (on my machine) very close to the physical limits.
The hand coded shuffle routine is used in preference to List::Util::shuffle, because it operated in-place rather than duplicating the list. This could equally well be an in-place qsort.
The equivalent Haskel program would be less efficient because:
It would therefore need to duplicate the data at least once, which given the parameters of the program above, would push my machine into swapping.
Equally, the popular and celebrated Haskell qsort implementation uses huge amounts of ram as it divides and subdivides the input list into a zillion smaller lists during recursion. It's an extremely neat implementation and really quite fast on smallish lists, but efficiency isn't just about operations that run well within the limits, but also what happens when those limits are breached. And where those limits are.
Ram maybe cheap, but diskspace is even cheaper and the volumes of data being routinely processed are growing much faster than typical ram.
Haskell (and other FP langauges) that categorically deny the possibility of programming with side effects--anything that operates in-place--will always be disadvantaged with respect to dealing with large volumes of data that must be in memory for algorithms to work.
Perl is memory hungry, but also provides the tools that allow that hunger to be bypassed by using alternative techniques. Haskell doesn't provide this fallback position.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
GHC shuffle more efficient than Perl5.
by audreyt (Hermit) on Apr 22, 2005 at 14:23 UTC | |
by Anonymous Monk on Apr 22, 2005 at 19:17 UTC | |
| |
|
Haskell and inplace sorting
by audreyt (Hermit) on Apr 22, 2005 at 09:12 UTC | |
|
Re^25: Why is the execution order of subexpressions undefined?
by Anonymous Monk on Apr 18, 2005 at 01:23 UTC | |
by BrowserUk (Patriarch) on Apr 18, 2005 at 01:32 UTC |