Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have an array of file names which begin with the first 3 letters of the 12 months. 1 file for each day of the year. I would like to sort them in chronological order:

Jan->Feb->Mar->May->Jun-> .... ->Dec

I seem to remember away to sort based on an array, so i created an array @months=("Jan", ..., "Dec"); How do I use this with the sort function?

Replies are listed 'Best First'.
•Re: sorting an array with an array
by merlyn (Sage) on Oct 01, 2002 at 15:37 UTC
    my %order; @order{qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)} = (1..12); ... my @sorted_filenames = sort { $order{substr($a,0,3)} <=> $order{substr($b,0,3)} } @unsorted_filenames;
    This'll be ok up to 100 filenames or so. If you have more, you'll probably want to cache the substr-naming mapping using a Schwartzian Transform or other device.

    Or, you could even go for a merge sort, which will scale for 1000's of filenames:

    my %piles; for (@unsorted_filenames) { push @{$piles{substr($_, 0, 3)}), $_; } my @result; for (qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)) { push @result, sort @{$piles{$_}}; delete $piles{$_}; } die if keys %piles; # unexpected prefix!

    -- Randal L. Schwartz, Perl hacker

      1 file for each day of the year

      Your first case assumes one file per month (well, it doesn't sort within each month, which was requested). Your second case doesn't deal with the possibility of a lack of leading zeros in days of the month.

              - tye (a leading zero himself)
(tye)Re: sorting an array with an array
by tye (Sage) on Oct 01, 2002 at 16:03 UTC

    Simply replace the month name with a string that sorts as you like (using the default sort), then undo that replacement after the sort:

    my @files= <*>; @files= do { my @mo= ( qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec), 'a'..'l' ); my %mo; @mo{@mo}= reverse @mo; grep { s/^(.)/$mo{$1}/ } sort grep { s/^(...)/$mo{$1}/ } @files; };
    So we set up %mo so that $mo{Jan} is 'a' and $mo{a} is 'Jan'. Then we read the "grep sort grep" from the bottom up: Take the unsorted list of file names, take the first 3 letters of each file name and replace it with appropriate letter, sort the list of modified file names, replace the first letter of the modified file names with the month abbreviations, return the sorted list.

    Note that this assumes that we have a leading zero in front of single-digit days of the month and that all of the month names are capitalized exactly as we expect. If not, then we need to account for that as well. (It also doesn't deal with invalid file names.)

    my @files= <*>; @files= do { my @mo= qw(jan feb mar apr may jun jul aug sep oct nov dec); my %mo; @mo{@mo}= 0..$#mo; grep { s/^[^-]*-// } sort grep { # Replace "JAN1" with "a01-JAN1": s/^(...)(\d\d?)/sprintf "%s%02d-%s",$mo{lc$1},$2,$1.$2/e or $_= "?-$_"; 1; } @files; };
    Update: You see, this is why Perl needs a 'filter' primitive. Neither map nor grep is quite the right tool for this job and map seems almost seductively right for it which results in people making mistakes like I just did.

    In the first snippet, I originally had 'map' instead of 'grep'. 'map' will give back the return value from the s/// operator, which is not the modified string.

    The second snippet had the same mistake but when I fixed it I also decided to deal with filenames that aren't named as we expected.

    It is easy to characterize this as an abuse of grep. I tend to agree. (:

            - tye (just that sort of guy... sort of)
      sub filter (&@) {map {$_ [0] -> (); $_} @_ [1 .. $#_]}

      Abigail

        Yes, it isn't that hard to write one. But you missed one aspect in your version: filter { s/a/b/ } @list shouldn't modify the original @list.

        Let me go dig up my version...

        Update: Make that two features. (: I also think that 'filter' in a scalar context should return the concatenated values (a count is pretty useless):

        package filter; use strict; require Exporter; { my $nowarn= *import } *import= \&Exporter::import; use vars qw( @EXPORT ); @EXPORT= qw( filter ); sub filter(&@) { my( $code, @vals )= @_; # local( $_ ); # Done by the loop. for( @vals ) { $code->(); } wantarray ? @vals : join "", @vals; } 1;
        Though I've only tested this a little.

                - tye (filtering shouldn't be straining)
      In the first snippet, I originally had 'map' instead of 'grep'. 'map' will give back the return value from the s/// operator, which is not the modified string.
      (from another follow-up:)
      But you missed one aspect in your version: filter { s/a/b/ } @list shouldn't modify the original @list.
      In that case, make a copy. And return $_.
      map { local $_ = $_; s/^[^-]*-//; $_ } LIST
      Per Abigail-II's trail of thought with context sensitivity added:
      sub filter (&@) { (wantarray ? sub { @_ } : sub { join '', @_ } )->( map { local $_ = $_; $_[0]->(); $_ } @_[1 .. $#_] ); }
      However, I really find this silly, as you can simply do the following:
      sub filter (&@) { my ($func, @out) = @_; $func->() for @out; wantarray ? @out : join '', @out; }

      Makeshifts last the longest.

Re: sorting an array with an array
by bart (Canon) on Oct 01, 2002 at 22:14 UTC
    First of all, invert that array into a hash.
    my %monthindex; @monthindex{@months} = 0 .. $#months;
    So now, for any 3 character month string with the proper case, $months[$monthindex{$monthname}] will return the original string. That's why I called it inversion: it inverts the function that is the array lookup by index.

    With this, you can do a plain stupid direct sort on hash value:

    my @sorted = sort { $monthindex{substr $a, 0, 3} <=> $monthindex{subst +r $b, 0, 3} || $a cmp $b } @files;
    Or you can do a more sofisticated version with a Schwartzian Transform, caching the substr or better still, the monthindex for the file:
    my @sorted = map $_->[0], sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0] } map [ $_, $monthindex{substr $_, 0, 3}], @files;

    In general, I think it would be smarter to do the month lookup for unified case, like all lower case, so you can just as well sort "sep", "Sep", or "SEP".

    my @months = qw(jan feb mar apr may jun jul aug sep oct nov dec); my %monthindex; @monthindex{@months} = 0 .. $#months; my @sorted = map $_->[0], sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0] } map [ $_, $monthindex{substr lc, 0, 3}], @files;
Re: sorting an array with an array
by BrowserUk (Patriarch) on Oct 02, 2002 at 06:49 UTC

    The first thing I noticed was that there isn't any need to use a hash for the lookup function. Instead of an array, you can just use index into a string containing the three character days in the appropriate order

    # The spaces aren't really necessary. my $months = "jan feb mar apr may jun jul aug sep oct nov dec"; my @sorted = sort{ index( $months, lc(substr($a,0,3)), 0) <=> index( $months, lc(substr($b,0,3)), 0) || $a cmp $b } @files;

    The nice thing is that misspelt filenames will just get sorted to the top (or bottom) rather than breaking the sort.

    Given that the OP said that there was one file per day, a max of 366 files, I decided to compare this simple sort against an ST sort (I stole bart's implementation). The results showed that for 366 filenames, a higher resolution timer was needed to distinguish them apart.

    That got me to wondering what the breakpoint would be if there were more than 1 file per day, and the results are surprising. For up to a 100/day, for a total of 36600 filenames, the simple sort outperformed the ST by a substantial margin.

    #! perl -sw use strict; sub shufl { $a = $_ + rand @_ - $_ and @_[$_, $a] = @_[$a, $_] for (0. +.$#_); return @_; } sub rndtxt{ my $s=''; $s .= chr(65+rand 26) for 1 .. $_[0]; $s } # Gen some test data - 1 filename for every day of the year my @test = grep{ -1 == index('feb30feb31apr31jun31sep31nov31', $_) } # remove + funnies map{ local $.=$_; (map{$. . sprintf('%02d',$_)}1..31) } qw(jan feb mar apr may jun jul aug sep oct nov dec); my @files; push@files,@test for 1 .. $ARGV[0]||1; # duplicate to give n filenames +/day # and mix them up and add random stuff @files = map{ $_ . rndtxt(8) } shufl @files; my $months = "jan feb mar apr may jun jul aug sep oct nov dec"; my $start = time; # Time it my @sorted = sort{ index( $months, lc(substr($a,0,3)), 0) <=> index( $months, lc(substr($b,0,3)), 0) || $a cmp $b } @files; # sort them print 'Sorting ', scalar @sorted, ' filenames took ', time-$start, +' seconds.', $/; my @months = qw(jan feb mar apr may jun jul aug sep oct nov dec); my %monthindex; @monthindex{@months} = 0 .. $#months; $start = time; my @sorted2 = map $_->[0], sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0] } map [ $_, $monthindex{lc substr $_, 0, 3}], # no point in lc the + whole lot @files; print 'Sorting ST ', scalar @sorted, ' filenames took ', time-$start, +' seconds.', $/; print 'The results of both sorts verify as ', ("@sorted" eq "@sorted2" +) ? "the same\n" : "different\n"; __DATA__ c:\test>202022 Sorting 366 filenames took 0 seconds. Sorting ST 366 filenames took 0 seconds. The results of both sorts verify as the same c:\test>202022 10 Sorting 3660 filenames took 1 seconds. Sorting ST 3660 filenames took 3 seconds. The results of both sorts verify as the same c:\test>202022 20 Sorting 7320 filenames took 5 seconds. Sorting ST 7320 filenames took 7 seconds. The results of both sorts verify as the same c:\test>202022 30 Sorting 10980 filenames took 11 seconds. Sorting ST 10980 filenames took 17 seconds. The results of both sorts verify as the same c:\test>202022 50 Sorting 18300 filenames took 29 seconds. Sorting ST 18300 filenames took 43 seconds. The results of both sorts verify as the same c:\test>202022 100 Sorting 36600 filenames took 110 seconds. Sorting ST 36600 filenames took 170 seconds. The results of both sorts verify as the same c:\test>

    Then I thought, maybe the hash lookup was the significant factor, so I modified the ST to use index instead of a hash to ensure I was comparing apples with apples.

    Even then, the simple sort out performs the ST up to 100/day.

    #! perl -sw use strict; sub shufl { $a = $_ + rand @_ - $_ and @_[$_, $a] = @_[$a, $_] for (0. +.$#_); return @_; } sub rndtxt{ my $s=''; $s .= chr(65+rand 26) for 1 .. $_[0]; $s } # Gen some test data - 1 filename for every day of the year my @test = grep{ -1 == index('feb30feb31apr31jun31sep31nov31', $_) } # remove + funnies map{ local $.=$_; (map{$. . sprintf('%02d',$_)}1..31) } qw(jan feb mar apr may jun jul aug sep oct nov dec); my @files; push@files,@test for 1 .. $ARGV[0]||1; # duplicate to give n filenames +/day # and mix them up and add random stuff @files = map{ $_ . rndtxt(8) } shufl @files; my $months = "jan feb mar apr may jun jul aug sep oct nov dec"; my $start = time; # Time it my @sorted = sort{ index( $months, lc(substr($a,0,3)), 0) <=> index( $months, lc(substr($b,0,3)), 0) || $a cmp $b } @files; # sort them print 'Sorting ', scalar @sorted, ' filenames took ', time-$start, + ' seconds.', $/; my @months = qw(jan feb mar apr may jun jul aug sep oct nov dec); my %monthindex; @monthindex{@months} = 0 .. $#months; $start = time; my @sorted2 = map $_->[0], sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0] } map [ $_, index( $months, lc(substr($_,0,3)), 0)], @files; print 'Sorting mST ', scalar @sorted, ' filenames took ', time-$start, + ' seconds.', $/; print 'The results of both sorts verify as ', ("@sorted" eq "@sorted2" +) ? "the same\n" : "different\n"; __DATA__ c:\test>202022-3 Sorting 366 filenames took 0 seconds. Sorting mST 366 filenames took 0 seconds. The results of both sorts verify as the same c:\test>202022-3 10 Sorting 3660 filenames took 1 seconds. Sorting mST 3660 filenames took 2 seconds. The results of both sorts verify as the same c:\test>202022-3 100 Sorting 36600 filenames took 108 seconds. Sorting mST 36600 filenames took 171 seconds. The results of both sorts verify as the same c:\test>

    However, the results are nearly identical. It seems that the overhead of creating all those little arrays is significant enough to require care to check that the expense of the repeatative function being cached is greater. The simple sort obviously has no memory overhead at all.


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!