Re: Sorting based on filesize.
by davido (Cardinal) on Jul 20, 2004 at 02:58 UTC
|
my @sorted = sort { -s $a <=> -s $b } @files;
Or for a more complete example...
use strict;
use warnings;
{
opendir (my $dir, '.') or die "Can't open the directory.\n$!";
local $, = "\n";
print sort { -s $a <=> -s $b } grep -f, readdir($dir);
closedir $dir;
}
See perlfunc under the -X functions to understand what -s is.
| [reply] [d/l] [select] |
Re: Sorting based on filesize.
by pbeckingham (Parson) on Jul 20, 2004 at 04:08 UTC
|
You need the Schwartzian Transform. It precomputes the sizes of all the files, and performs sorting based on the precomputed values, reducing the number of times it would otherwise read file size from the disk.
my @sorted =
map {$_->[0] }
sort {$a->[1] <=> $b->[1]}
map {[$_, -s] }
@files;
| [reply] [d/l] |
|
|
You need the Schwartzian Transform
I'd say it's doubtful. Clearly, stat is slow, but it's not a slouch, and the Schwartz Transform adds its own overhead. I suspect the sweet spot where the ST pulls ahead is higher than you think.
If we want to pursue this approach, an improvement will be to employ a Guttman-Rosler Transform:
my @sorted =
map substr($_, 4) =>
sort
map pack(`L',$_) . $_ => @files;
# but see eyepopslikeamosquito's comment below
By being able to dip down and call the sort function without a code block will show greater benefits. With the proviso that one might have to pack with 8 byte 'Q' quads, for systems with filesizes larger than 2Gb.
With that in mind, it's probably all premature optimisation. I would wait until the snippet showed itself as being a performance pig before going to all this bother:
sort { -s $a <=> -s $b } @files
... does have the distinct advantage of being eminently readable.
- another intruder with the mooring of the heat of the Perl
| [reply] [d/l] [select] |
|
|
my @sorted =
map substr($_, 4) =>
sort
map pack(`L',$_) . $_ => @files;
For improved portability, that pack('L',$_) above should read pack('N',$_) because 'L' is native while 'N' is network (big-endian).
See also Re: Advanced Sorting - GRT - Guttman Rosler Transform.
| [reply] |
|
|
| [reply] |
|
|
Re: Sorting based on filesize.
by keszler (Priest) on Jul 20, 2004 at 03:24 UTC
|
Someone correct me if I'm wrong, but won't sort { -s $a <=> -s $b } @files; stat each file multiple times, depending on the size and order in the array?
If so, and if RAM is cheaper than stat calls, then
@files = grep { -f } glob '* .*';
@hash{ (map { -s $_ } @files) } = @files;
@files = @hash{ sort { $a <=> $b } keys %hash };
print map {$_,$/} @files;
a hash of filenames keyed on sizes could be used instead. | [reply] [d/l] [select] |
|
|
a hash of filenames keyed on sizes could be used instead.
Well, when two or more files happen to have the same size, the hash will only keep one file name from that set. If that's okay, then yes, a simple hash like this could be used.
If that's not okay, then you'd need a hash of arrays:
my %hash;
opendir( D, "." );
while ( $_ = readdir D ) {
next unless (-f);
push @{$hash{-s _}}, $_; # uses previous stat data (from -f)
}
for my $size ( sort {$a<=>$b} keys %hash ) {
for my $file ( sort @{$hash{$size}} ) {
print "$size\t$file\n";
}
}
update: added comment about how many times stat is actually called on each file (i.e. just once, not twice), and wanted to point out that the Schwartzian Transform as demonstrated by pbeckingham is likely to be the best solution overall. | [reply] [d/l] |
|
|
Good point. This fixes it:
@files = grep { -f } glob '* .*';
@hash{ (map { (-s $_) . ".$_"} @files) } = @files;
@files = @hash{ sort { $a <=> $b } keys %hash };
print map {$_,$/} @files;
but it's rather a moot point - the Schwartzian Transform goes a step further and eliminates the hash. | [reply] [d/l] |
Re: Sorting based on filesize.
by eyepopslikeamosquito (Archbishop) on Jul 20, 2004 at 03:03 UTC
|
my @files = glob("*");
my @sorted = sort { -s $a <=> -s $b } @files;
| [reply] [d/l] |
Re: Sorting based on filesize.
by cLive ;-) (Prior) on Jul 20, 2004 at 02:44 UTC
|
| [reply] |