in reply to Re: Sorting based on filesize.
in thread Sorting based on filesize.

You need the Schwartzian Transform

I'd say it's doubtful. Clearly, stat is slow, but it's not a slouch, and the Schwartz Transform adds its own overhead. I suspect the sweet spot where the ST pulls ahead is higher than you think.

If we want to pursue this approach, an improvement will be to employ a Guttman-Rosler Transform:

my @sorted = map substr($_, 4) => sort map pack(`L',$_) . $_ => @files; # but see eyepopslikeamosquito's comment below

By being able to dip down and call the sort function without a code block will show greater benefits. With the proviso that one might have to pack with 8 byte 'Q' quads, for systems with filesizes larger than 2Gb.

With that in mind, it's probably all premature optimisation. I would wait until the snippet showed itself as being a performance pig before going to all this bother:

sort { -s $a <=> -s $b } @files

... does have the distinct advantage of being eminently readable.

- another intruder with the mooring of the heat of the Perl

Replies are listed 'Best First'.
Re^3: Sorting based on filesize.
by eyepopslikeamosquito (Archbishop) on Jul 20, 2004 at 08:41 UTC
    my @sorted = map substr($_, 4) => sort map pack(`L',$_) . $_ => @files;

    For improved portability, that pack('L',$_) above should read pack('N',$_) because 'L' is native while 'N' is network (big-endian). See also Re: Advanced Sorting - GRT - Guttman Rosler Transform.

•Re^3: Sorting based on filesize.
by merlyn (Sage) on Jul 20, 2004 at 13:20 UTC
    I'd say it's doubtful. Clearly, stat is slow, but it's not a slouch, and the Schwartz Transform adds its own overhead. I suspect the sweet spot where the ST pulls ahead is higher than you think.
    Actually, this is the core of one of the alpaca exercises, and I forget the exact numbers, but the ST turns out to be faster after only 10 or 15 files in our investigation. Stat may not be doggy, but it's still a context switch and a lot of bit wrangling. Much better to do that only once per file.

    And (ahem) it's Schwartzian Transform not Schwartz Transform. If you're going to poo-poo it in favor of the more complex GRT, at least spell it right. {grin}

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Talking of ST versus GRT comparisons, Uri's new Sort::Maker can do ST, GRT, Orcish (OR-Cache) and plain old sort, and makes things very easy to benchmark and change between different algorithms. It builds sorting code on the fly. Very cool.