#! perl -sw
use strict;
$| = 1;
sub shuffle {
my $ref = @_ == 1 ? $_[ 0 ] : \@_;
my $n = @$ref;
for( 0 .. $#$ref ) {
my $p = $_ + rand( $n-- );
my $t = $ref->[ $p ];
$ref->[ $p ] = $ref->[ $_ ];
$ref->[ $_ ] = $t;
}
return unless defined wantarray;
return wantarray ? @{ $ref } : $ref;
}
warn 'Start : '.localtime() .$/;
my @data = <>;
warn 'Read : '.localtime() .$/;
shuffle \@data;
warn 'Shuffled: '.localtime().$/;
print for @data;
warn 'Written : '.localtime().$/;
__END__
P:\test>shuffleFile.pl data\1millionlines.dat >nul
Start : Fri Apr 15 19:41:44 2005
Read : Fri Apr 15 19:41:53 2005
Shuffled: Fri Apr 15 19:41:55 2005
Written : Fri Apr 15 19:41:57 2005
Can Haskell do this? I believe the answer to be no, because the 1 million line file in my example was chosen because, when loaded, it is pushes the memory consumption (on my machine) very close to the physical limits.
The hand coded shuffle routine is used in preference to List::Util::shuffle, because it operated in-place rather than duplicating the list. This could equally well be an in-place qsort.
The equivalent Haskel program would be less efficient because:
- It could not do the shuffle/sort in-place, as that involves side-effects.
It would therefore need to duplicate the data at least once, which given the parameters of the program above, would push my machine into swapping.
- A Haskell equivalent would probably require twice or more times as much memory, as the only fair shuffle mechanism I have seen that doesn't operate in-place, uses a tree structure which must take up more storage than an array or list (I think).
Equally, the popular and celebrated Haskell qsort implementation uses huge amounts of ram as it divides and subdivides the input list into a zillion smaller lists during recursion. It's an extremely neat implementation and really quite fast on smallish lists, but efficiency isn't just about operations that run well within the limits, but also what happens when those limits are breached. And where those limits are.
Ram maybe cheap, but diskspace is even cheaper and the volumes of data being routinely processed are growing much faster than typical ram.
Haskell (and other FP langauges) that categorically deny the possibility of programming with side effects--anything that operates in-place--will always be disadvantaged with respect to dealing with large volumes of data that must be in memory for algorithms to work.
Perl is memory hungry, but also provides the tools that allow that hunger to be bypassed by using alternative techniques. Haskell doesn't provide this fallback position.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?
|