Re: Sorting based on filesize.

Replies are listed 'Best First'.
Re^2: Sorting based on filesize. by grinder (Bishop) on Jul 20, 2004 at 06:28 UTC
You need the Schwartzian Transform I'd say it's doubtful. Clearly, stat is slow, but it's not a slouch, and the Schwartz Transform adds its own overhead. I suspect the sweet spot where the ST pulls ahead is higher than you think. If we want to pursue this approach, an improvement will be to employ a Guttman-Rosler Transform: my @sorted = map substr($_, 4) => sort map pack(`L',$_) . $_ => @files; # but see eyepopslikeamosquito's comment below [download] By being able to dip down and call the sort function without a code block will show greater benefits. With the proviso that one might have to pack with 8 byte 'Q' quads, for systems with filesizes larger than 2Gb. With that in mind, it's probably all premature optimisation. I would wait until the snippet showed itself as being a performance pig before going to all this bother: `sort { -s $a <=> -s $b } @files` [download] ... does have the distinct advantage of being eminently readable. - another intruder with the mooring of the heat of the Perl	[reply] [d/l] [select]
Re^3: Sorting based on filesize. by eyepopslikeamosquito (Archbishop) on Jul 20, 2004 at 08:41 UTC
my @sorted = map substr($_, 4) => sort map pack(`L',$_) . $_ => @files; For improved portability, that pack('L',$_) above should read pack('N',$_) because 'L' is native while 'N' is network (big-endian). See also Re: Advanced Sorting - GRT - Guttman Rosler Transform.	[reply]
•Re^3: Sorting based on filesize. by merlyn (Sage) on Jul 20, 2004 at 13:20 UTC
I'd say it's doubtful. Clearly, stat is slow, but it's not a slouch, and the Schwartz Transform adds its own overhead. I suspect the sweet spot where the ST pulls ahead is higher than you think. Actually, this is the core of one of the alpaca exercises, and I forget the exact numbers, but the ST turns out to be faster after only 10 or 15 files in our investigation. Stat may not be doggy, but it's still a context switch and a lot of bit wrangling. Much better to do that only once per file. And (ahem) it's Schwartzian Transform not Schwartz Transform. If you're going to poo-poo it in favor of the more complex GRT, at least spell it right. {grin} -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^4: Sorting based on filesize. by pbeckingham (Parson) on Jul 21, 2004 at 00:46 UTC
Talking of ST versus GRT comparisons, Uri's new Sort::Maker can do ST, GRT, Orcish (OR-Cache) and plain old sort, and makes things very easy to benchmark and change between different algorithms. It builds sorting code on the fly. Very cool.	[reply]