Re^2: Sort the file names a list

A better way (that someone named something fancy which I cannot remember):

Memoizing is caching the result of a function based on its inputs. You're memoizing split.

Less of a tradeoff:

my @sorted =
    map substr($_, 19),
    sort
    map substr($_, -(19+20), 19) . $_,
    @LogAnalyser::LogFileList;
[download]

For the data posted in the OP:

            Rate memoized       st    naive      grt
memoized 12517/s       --     -30%     -64%     -66%
st       17883/s      43%       --     -48%     -52%
naive    34555/s     176%      93%       --      -6%
grt      36872/s     195%     106%       7%       --
[download]

use strict;
use warnings;

use Benchmark qw( cmpthese );

my %cache;
sub memoized {
    my $left  = ( $cache{$a} ||= substr($a, -(19+20), 19) );
    my $right = ( $cache{$b} ||= substr($b, -(19+20), 19) );
    return $left cmp $right;
}

@::LogFileList if 0;
@::LogFileList = qw(
/devLog/devid234/term_logs/devid234.2009-08-27-23-55-11.terminal.log.t
+ar.gz
/devLog/devid234/term_logs/devid234.2009-08-27-15-05-06.terminal.log.t
+ar.gz
/devLog/devid234/term_logs/devid234.2009-08-27-01-45-03.terminal.log.t
+ar.gz
/devLog/devid234/term_logs/devid234.2009-08-28-00-00-01.terminal.log.t
+ar.gz
/devLog/devid234/term_logs/devid234.2009-08-28-18-25-04.terminal.log.t
+ar.gz
/devLog/devid168/term_logs/devid168.2009-08-28-01-35-02.terminal.log.t
+ar.gz
/devLog/devid168/term_logs/devid168.2009-08-27-04-02-01.terminal.log.t
+ar.gz
/devLog/devid168/term_logs/devid168.2009-08-28-20-25-01.terminal.log.t
+ar.gz
/devLog/devid168/term_logs/devid168.2009-08-27-17-55-01.terminal.log.t
+ar.gz
/devLog/devid918/term_logs/devid918.2009-08-27-21-15-01.terminal.log.t
+ar.gz
/devLog/devid918/term_logs/devid918.2009-08-27-13-25-01.terminal.log.t
+ar.gz
/devLog/devid918/term_logs/devid918.2009-08-27-00-00-01.terminal.log.t
+ar.gz
/devLog/devid918/term_logs/devid918.2009-08-28-00-00-02.terminal.log.t
+ar.gz
/devLog/devid918/term_logs/devid918.2009-08-28-09-45-01.terminal.log.t
+ar.gz
/devLog/devid918/term_logs/devid918.2009-08-28-19-25-01.terminal.log.t
+ar.gz
/devLog/devid167/term_logs/devid167.2009-08-28-02-45-01.terminal.log.t
+ar.gz
/devLog/devid167/term_logs/devid167.2009-08-27-01-45-02.terminal.log.t
+ar.gz
/devLog/devid167/term_logs/devid167.2009-08-27-10-55-01.terminal.log.t
+ar.gz
);

cmpthese(-3, {
    naive => '
        use strict; use warnings;
        my @sorted =
            sort {
                substr($a, -(19+20), 19)
                   cmp
                substr($b, -(19+20), 19)
            }
            @::LogFileList;
    ',

    memoized => '
        use strict; use warnings;
        my @sorted = sort memoized @::LogFileList;
    ',

    st => '
        use strict; use warnings;
        my @sorted =
            map $_->[0],
            sort { $a->[1] cmp $b->[1] }
            map [ $_, substr($_, -(19+20), 19) ],
            @::LogFileList;
    ',

    grt => '
        use strict; use warnings;
        my @sorted =
            map substr($_, 19),
            sort
            map substr($_, -(19+20), 19) . $_,
            @::LogFileList;
    ',
});
[download]

Update: Added ST.
Update: Fixed bug in substr indexes.

Comment on Re^2: Sort the file names a list Select or Download Code

Replies are listed 'Best First'.
Re^3: Sort the file names a list by bv (Friar) on Aug 31, 2009 at 16:42 UTC
ikegami++! I'd seen the GRT discussed, but never tried it myself. I guess I never thought enough about it, and my first-glance assessment was that memoization was faster. That'll teach me to make assumptions without testing! Oh, and I saw the memoization technique called the "Orcish (or-cache) maneuver" in Perl Underground 2 (credit japhy). `$,=' ';$\=',';$_=[qw,Just another Perl hacker,];print@$_;`	[reply] [d/l]
Re^4: Sort the file names a list by ikegami (Patriarch) on Aug 31, 2009 at 16:50 UTC
I had a bug. Fixing it affected the times. You'll note that your solution actually slows down the sorting.	[reply]
Re^5: Sort the file names a list by bv (Friar) on Aug 31, 2009 at 17:11 UTC
I thought that it might, for a small set of inputs. The overhead of creating and populating the hash, as well as assignments to two new lexicals would only be overcome with a significantly larger data set. I wonder, though. Yep. Changing it to this: `{ my %cache; sub memoized { ( $cache{$a} \|\|= substr($a, -(19+20), 19) ) cmp ( $cache{$b} \|\|= substr($b, -(19+20), 19) ); } }` [download] gives a slight increase in speed (but my benchmarks are varying wildly right now. Maintaining ranking, but % differences are all over the board.) `Rate st memoized naive grt st 10281/s -- -6% -51% -54% memoized 10930/s 6% -- -48% -51% naive 20942/s 104% 92% -- -5% grt 22143/s 115% 103% 6% --` [download] `$,=' ';$\=',';$_=[qw,Just another Perl hacker,];print@$_;`	[reply] [d/l] [select]
Re^3: Sort the file names a list by salva (Canon) on Sep 01, 2009 at 08:41 UTC
ikegami, I know you already know it, but reading your post somebody could thing that using the your GRT code is a good idea because it is the faster solution when actually it is not! It is not good because using `substr()` to extract the sorting keys is very weak. If for instance, the file names extensions are changed, it will extract incorrect keys and, what is worst, without reporting any warning or error to the user! Even if slower, using a regular expression to extract the keys is probably the best solution. In any case, there is a faster (not recommendable either) method... `... use Sort::Key qw(keysort); ... cmpthese(-3, { ... sk => ' use strict; use warnings; my @sorted = keysort { substr($_, -(19+20), 19) } @::LogFileLi +st; ', });` [download] that on my computer runs as... `Rate memoized st grt naive sk memoized 8784/s -- -26% -51% -54% -58% st 11894/s 35% -- -33% -38% -43% grt 17794/s 103% 50% -- -7% -14% naive 19207/s 119% 61% 8% -- -7% sk 20714/s 136% 74% 16% 8% --` [download] Note also that on my hardware, `naive` is actually faster than `grt`.	[reply] [d/l] [select]
Re^4: Sort the file names a list by ikegami (Patriarch) on Sep 01, 2009 at 13:58 UTC
What? First of all, do you have a problem with GRT or `substr`? You said GRT, but you talk of the `substr` I used for key extraction in all methods. Secondly, if the spec changes, of course the code will need to be changed. I don't see that as a problem. Switching to using a regex match is not going to help this. So, no, I don't "already know it".	[reply] [d/l] [select]