in reply to sorting an array with an array
The first thing I noticed was that there isn't any need to use a hash for the lookup function. Instead of an array, you can just use index into a string containing the three character days in the appropriate order
# The spaces aren't really necessary. my $months = "jan feb mar apr may jun jul aug sep oct nov dec"; my @sorted = sort{ index( $months, lc(substr($a,0,3)), 0) <=> index( $months, lc(substr($b,0,3)), 0) || $a cmp $b } @files;
The nice thing is that misspelt filenames will just get sorted to the top (or bottom) rather than breaking the sort.
Given that the OP said that there was one file per day, a max of 366 files, I decided to compare this simple sort against an ST sort (I stole bart's implementation). The results showed that for 366 filenames, a higher resolution timer was needed to distinguish them apart.
That got me to wondering what the breakpoint would be if there were more than 1 file per day, and the results are surprising. For up to a 100/day, for a total of 36600 filenames, the simple sort outperformed the ST by a substantial margin.
#! perl -sw use strict; sub shufl { $a = $_ + rand @_ - $_ and @_[$_, $a] = @_[$a, $_] for (0. +.$#_); return @_; } sub rndtxt{ my $s=''; $s .= chr(65+rand 26) for 1 .. $_[0]; $s } # Gen some test data - 1 filename for every day of the year my @test = grep{ -1 == index('feb30feb31apr31jun31sep31nov31', $_) } # remove + funnies map{ local $.=$_; (map{$. . sprintf('%02d',$_)}1..31) } qw(jan feb mar apr may jun jul aug sep oct nov dec); my @files; push@files,@test for 1 .. $ARGV[0]||1; # duplicate to give n filenames +/day # and mix them up and add random stuff @files = map{ $_ . rndtxt(8) } shufl @files; my $months = "jan feb mar apr may jun jul aug sep oct nov dec"; my $start = time; # Time it my @sorted = sort{ index( $months, lc(substr($a,0,3)), 0) <=> index( $months, lc(substr($b,0,3)), 0) || $a cmp $b } @files; # sort them print 'Sorting ', scalar @sorted, ' filenames took ', time-$start, +' seconds.', $/; my @months = qw(jan feb mar apr may jun jul aug sep oct nov dec); my %monthindex; @monthindex{@months} = 0 .. $#months; $start = time; my @sorted2 = map $_->[0], sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0] } map [ $_, $monthindex{lc substr $_, 0, 3}], # no point in lc the + whole lot @files; print 'Sorting ST ', scalar @sorted, ' filenames took ', time-$start, +' seconds.', $/; print 'The results of both sorts verify as ', ("@sorted" eq "@sorted2" +) ? "the same\n" : "different\n"; __DATA__ c:\test>202022 Sorting 366 filenames took 0 seconds. Sorting ST 366 filenames took 0 seconds. The results of both sorts verify as the same c:\test>202022 10 Sorting 3660 filenames took 1 seconds. Sorting ST 3660 filenames took 3 seconds. The results of both sorts verify as the same c:\test>202022 20 Sorting 7320 filenames took 5 seconds. Sorting ST 7320 filenames took 7 seconds. The results of both sorts verify as the same c:\test>202022 30 Sorting 10980 filenames took 11 seconds. Sorting ST 10980 filenames took 17 seconds. The results of both sorts verify as the same c:\test>202022 50 Sorting 18300 filenames took 29 seconds. Sorting ST 18300 filenames took 43 seconds. The results of both sorts verify as the same c:\test>202022 100 Sorting 36600 filenames took 110 seconds. Sorting ST 36600 filenames took 170 seconds. The results of both sorts verify as the same c:\test>
Then I thought, maybe the hash lookup was the significant factor, so I modified the ST to use index instead of a hash to ensure I was comparing apples with apples.
Even then, the simple sort out performs the ST up to 100/day.
#! perl -sw use strict; sub shufl { $a = $_ + rand @_ - $_ and @_[$_, $a] = @_[$a, $_] for (0. +.$#_); return @_; } sub rndtxt{ my $s=''; $s .= chr(65+rand 26) for 1 .. $_[0]; $s } # Gen some test data - 1 filename for every day of the year my @test = grep{ -1 == index('feb30feb31apr31jun31sep31nov31', $_) } # remove + funnies map{ local $.=$_; (map{$. . sprintf('%02d',$_)}1..31) } qw(jan feb mar apr may jun jul aug sep oct nov dec); my @files; push@files,@test for 1 .. $ARGV[0]||1; # duplicate to give n filenames +/day # and mix them up and add random stuff @files = map{ $_ . rndtxt(8) } shufl @files; my $months = "jan feb mar apr may jun jul aug sep oct nov dec"; my $start = time; # Time it my @sorted = sort{ index( $months, lc(substr($a,0,3)), 0) <=> index( $months, lc(substr($b,0,3)), 0) || $a cmp $b } @files; # sort them print 'Sorting ', scalar @sorted, ' filenames took ', time-$start, + ' seconds.', $/; my @months = qw(jan feb mar apr may jun jul aug sep oct nov dec); my %monthindex; @monthindex{@months} = 0 .. $#months; $start = time; my @sorted2 = map $_->[0], sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0] } map [ $_, index( $months, lc(substr($_,0,3)), 0)], @files; print 'Sorting mST ', scalar @sorted, ' filenames took ', time-$start, + ' seconds.', $/; print 'The results of both sorts verify as ', ("@sorted" eq "@sorted2" +) ? "the same\n" : "different\n"; __DATA__ c:\test>202022-3 Sorting 366 filenames took 0 seconds. Sorting mST 366 filenames took 0 seconds. The results of both sorts verify as the same c:\test>202022-3 10 Sorting 3660 filenames took 1 seconds. Sorting mST 3660 filenames took 2 seconds. The results of both sorts verify as the same c:\test>202022-3 100 Sorting 36600 filenames took 108 seconds. Sorting mST 36600 filenames took 171 seconds. The results of both sorts verify as the same c:\test>
However, the results are nearly identical. It seems that the overhead of creating all those little arrays is significant enough to require care to check that the expense of the repeatative function being cached is greater. The simple sort obviously has no memory overhead at all.
|
|---|