in reply to Re: Sort the file names a list
in thread Sort the file names a list

A better way (that someone named something fancy which I cannot remember):

Memoizing is caching the result of a function based on its inputs. You're memoizing split.


Less of a tradeoff:

my @sorted = map substr($_, 19), sort map substr($_, -(19+20), 19) . $_, @LogAnalyser::LogFileList;

For the data posted in the OP:

Rate memoized st naive grt memoized 12517/s -- -30% -64% -66% st 17883/s 43% -- -48% -52% naive 34555/s 176% 93% -- -6% grt 36872/s 195% 106% 7% --
use strict; use warnings; use Benchmark qw( cmpthese ); my %cache; sub memoized { my $left = ( $cache{$a} ||= substr($a, -(19+20), 19) ); my $right = ( $cache{$b} ||= substr($b, -(19+20), 19) ); return $left cmp $right; } @::LogFileList if 0; @::LogFileList = qw( /devLog/devid234/term_logs/devid234.2009-08-27-23-55-11.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-27-15-05-06.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-27-01-45-03.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-28-00-00-01.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-28-18-25-04.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-28-01-35-02.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-27-04-02-01.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-28-20-25-01.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-27-17-55-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-27-21-15-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-27-13-25-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-27-00-00-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-28-00-00-02.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-28-09-45-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-28-19-25-01.terminal.log.t +ar.gz /devLog/devid167/term_logs/devid167.2009-08-28-02-45-01.terminal.log.t +ar.gz /devLog/devid167/term_logs/devid167.2009-08-27-01-45-02.terminal.log.t +ar.gz /devLog/devid167/term_logs/devid167.2009-08-27-10-55-01.terminal.log.t +ar.gz ); cmpthese(-3, { naive => ' use strict; use warnings; my @sorted = sort { substr($a, -(19+20), 19) cmp substr($b, -(19+20), 19) } @::LogFileList; ', memoized => ' use strict; use warnings; my @sorted = sort memoized @::LogFileList; ', st => ' use strict; use warnings; my @sorted = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, substr($_, -(19+20), 19) ], @::LogFileList; ', grt => ' use strict; use warnings; my @sorted = map substr($_, 19), sort map substr($_, -(19+20), 19) . $_, @::LogFileList; ', });

Update: Added ST.
Update: Fixed bug in substr indexes.

Replies are listed 'Best First'.
Re^3: Sort the file names a list
by bv (Friar) on Aug 31, 2009 at 16:42 UTC

    ikegami++! I'd seen the GRT discussed, but never tried it myself. I guess I never thought enough about it, and my first-glance assessment was that memoization was faster. That'll teach me to make assumptions without testing!

    Oh, and I saw the memoization technique called the "Orcish (or-cache) maneuver" in Perl Underground 2 (credit japhy).

    $,=' ';$\=',';$_=[qw,Just another Perl hacker,];print@$_;
      I had a bug. Fixing it affected the times. You'll note that your solution actually slows down the sorting.

        I thought that it might, for a small set of inputs. The overhead of creating and populating the hash, as well as assignments to two new lexicals would only be overcome with a significantly larger data set.

        I wonder, though. Yep. Changing it to this:

        { my %cache; sub memoized { ( $cache{$a} ||= substr($a, -(19+20), 19) ) cmp ( $cache{$b} ||= substr($b, -(19+20), 19) ); } }

        gives a slight increase in speed (but my benchmarks are varying wildly right now. Maintaining ranking, but % differences are all over the board.)

        Rate st memoized naive grt st 10281/s -- -6% -51% -54% memoized 10930/s 6% -- -48% -51% naive 20942/s 104% 92% -- -5% grt 22143/s 115% 103% 6% --
        $,=' ';$\=',';$_=[qw,Just another Perl hacker,];print@$_;
Re^3: Sort the file names a list
by salva (Canon) on Sep 01, 2009 at 08:41 UTC
    ikegami, I know you already know it, but reading your post somebody could thing that using the your GRT code is a good idea because it is the faster solution when actually it is not!

    It is not good because using substr() to extract the sorting keys is very weak. If for instance, the file names extensions are changed, it will extract incorrect keys and, what is worst, without reporting any warning or error to the user!

    Even if slower, using a regular expression to extract the keys is probably the best solution.

    In any case, there is a faster (not recommendable either) method...

    ... use Sort::Key qw(keysort); ... cmpthese(-3, { ... sk => ' use strict; use warnings; my @sorted = keysort { substr($_, -(19+20), 19) } @::LogFileLi +st; ', });
    that on my computer runs as...
    Rate memoized st grt naive sk memoized 8784/s -- -26% -51% -54% -58% st 11894/s 35% -- -33% -38% -43% grt 17794/s 103% 50% -- -7% -14% naive 19207/s 119% 61% 8% -- -7% sk 20714/s 136% 74% 16% 8% --
    Note also that on my hardware, naive is actually faster than grt.

      What?

      First of all, do you have a problem with GRT or substr? You said GRT, but you talk of the substr I used for key extraction in all methods.

      Secondly, if the spec changes, of course the code will need to be changed. I don't see that as a problem. Switching to using a regex match is not going to help this.

      So, no, I don't "already know it".