hotel has asked for the wisdom of the Perl Monks concerning the following question:

Dear all, I am trying to sort an array that consists of files names I got from a directory with readdir(). The names of the files in the directory (in order) are like below.
9.10.force.0.5.1LGY.pdb 29.30.force.0.5.1LGY.pdb 30.31.force.0.5.1LGY.pdb 31.32.force.0.5.1LGY.pdb .... .... 100.101.force.0.5.1LGY.pdb 101.102.force.0.5.1LGY.pdb .... .... 200.201.force.0.5.1LGY.pdb 201.200.force.0.5.1LGY.pdb
As far as I could figure out, sort() function doesn't sort the way I want it to, since it sorts alphabetically or numerically. It starts with
100.101.force.0.5.1LGY.pdb 101.102.force.0.5.1LGY.pdb
and continues in this manner, and then places the file 9.10.force.0.5.1LGY.pdb to its place alphabetical wrt its alphabetical order. However, I want to sort these files with respect to their first digit. If any of you can advice a way to do it, I'd appriciate. Thanks a lot.

Replies are listed 'Best First'.
Re: sorting an array of file names
by shmem (Chancellor) on Dec 19, 2009 at 16:43 UTC
    However, I want to sort these files with respect to their first digit.

    First digit or first number? if digit, the 201.200.force.0.5.1LGY.pdb comes before 9.10.force.0.5.1LGY.pdb, since 2 (from 201) is lower than 9. If you want to sort by the first number, do a Schwartzian transform:

    no warnings "numeric"; @sorted = map $_->[0], sort { $a->[1] <=> $b->[1] } map { [ $_, int $_ ] } @unsorted;
      First of all, thanks for the help. And forgive me for my messed up english. I wanted to sort the array with respect to the "number" before the first dot, not the first digit, of course. Anyway, it turned out to be a matter of using ST really, and i noticed that implementing an ST by myself is beyond my programming skills for now. However, when i used the code shmem proposed, it worked. Thanks to all of you for your time. I hope the OP (at least its topic) will be useful for future googlers, who are seeking for a solution for this particular, ehm, matter.
        Anyway, it turned out to be a matter of using ST

        I don't think you need a Schwartzian Transform here.  An ST makes sense if the individual comparison operation is computationally expensive. This is not the case with interpreting a string as a number, in particular as the conversion is done only once for each string and then "cached" in the NV/IV fields of the scalar variable(*).  In other words, the simple approach (not using ST) is even faster in this case:

        #!/usr/bin/perl use strict; use warnings; no warnings 'numeric'; use Benchmark 'cmpthese'; for my $e (2..5) { my $n = 10**$e; print "\nNumber of file names: $n\n"; my @data; push @data, join(".", int(rand($n)), int(rand($n)), 'force.0.5.1LG +Y.pdb') for 1..$n; cmpthese( 10**(6-$e), { 'simple' => sub { my @unsorted = @data; my @sorted = sort { $a <=> $b } @unsorted; }, 'ST' => sub { my @unsorted = @data; my @sorted = map $_->[0], sort { $a->[1] <=> $b->[1] } map { [ $_, int $_ ] } @unsorted; }, } ); } __END__ Number of file names: 100 Rate ST simple ST 3247/s -- -75% simple 12987/s 300% -- Number of file names: 1000 Rate ST simple ST 248/s -- -79% simple 1176/s 375% -- Number of file names: 10000 Rate ST simple ST 10.3/s -- -74% simple 39.2/s 280% -- Number of file names: 100000 s/iter ST simple ST 1.87 -- -50% simple 0.943 99% --

        Another beneficial side effect of the simple approach is that if you happen to have two names like this

        30.31.force.0.5.1LGY.pdb 30.32.force.0.5.1LGY.pdb

        they would be ordered in some useful way, because the fractional part of the number is automatically taken into consideration when just treating the name as a number.


        (*)

        use Devel::Peek; my $s = "30.31.force.0.5.1LGY.pdb"; Dump $s; print 0+$s, "\n"; # treat as number Dump $s; __END__ SV = PV(0x605150) at 0x604fa0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x6370d0 "30.31.force.0.5.1LGY.pdb"\0 CUR = 24 LEN = 32 30.31 SV = PVNV(0x607880) at 0x604fa0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,NOK,POK,pIOK,pNOK,pPOK) IV = 30 <--- NV = 30.31 <--- PV = 0x6370d0 "30.31.force.0.5.1LGY.pdb"\0 CUR = 24 LEN = 32
Re: sorting an array of file names
by Xilman (Hermit) on Dec 19, 2009 at 16:35 UTC

    I suggest you read up on the sort() function, whether chewed trees in the Camel book or with "perldoc -f sort".

    sort() can be given a function to define the sorting order. Your specification, "sort these files with respect to their first digit" is somewhat ambiguous. What do you want to do if two files have the same first digit? What do you want to happen if the file doesn't have a first digit?

    If I assume that you don't care what happens when a file doesn't contain a digit and when two files contain the same first digit, something like this might be what you require:

    sub mysort() { my ($a_digit) = ($a =~ /(\d)/); my ($b_digit) = ($b =~ /(\d)/); return 1 unless defined $a_digit && defined $b_digit; return $a_digit <==> $b_digit; } # Code to put file names into @files sort mysort @files;

    Paul

Re: sorting an array of file names
by bobf (Monsignor) on Dec 19, 2009 at 16:41 UTC

    The default sort order is lexical. If you want a numeric sort, you need to specify a sort routine using the <=> operator (see perlop). See sort for more information.

    Since your filenames have both numeric and alpha components, you will probably need to split the pieces apart before doing the comparison. The Schwartzian Transform is good for this kind of thing.

    Here is one approach. Extend the sort routine as needed if you want to specify a secondary/etc sort order in the event two filenames start with the same numeric component.

    use strict; use warnings; my @files = qw( 9.10.force.0.5.1LGY.pdb 29.30.force.0.5.1LGY.pdb 30.31.force.0.5.1LGY.pdb 31.32.force.0.5.1LGY.pdb 100.101.force.0.5.1LGY.pdb 101.102.force.0.5.1LGY.pdb ); my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, split( /\./, $_ ) ] } @files; print "$_\n" for @sorted;

    Love those pdb files. :-) Lipase, huh? Interesting stuff.

Re: sorting an array of file names
by almut (Canon) on Dec 19, 2009 at 16:34 UTC

    What's wrong with sorting numerically?

    #!/usr/bin/perl use strict; use warnings; my @data = <DATA>; { no warnings 'numeric'; print sort { $a <=> $b } @data; } __DATA__ 29.30.force.0.5.1LGY.pdb 30.31.force.0.5.1LGY.pdb 100.101.force.0.5.1LGY.pdb 9.10.force.0.5.1LGY.pdb 31.32.force.0.5.1LGY.pdb 200.201.force.0.5.1LGY.pdb 201.200.force.0.5.1LGY.pdb 101.102.force.0.5.1LGY.pdb

    Output:

    9.10.force.0.5.1LGY.pdb 29.30.force.0.5.1LGY.pdb 30.31.force.0.5.1LGY.pdb 31.32.force.0.5.1LGY.pdb 100.101.force.0.5.1LGY.pdb 101.102.force.0.5.1LGY.pdb 200.201.force.0.5.1LGY.pdb 201.200.force.0.5.1LGY.pdb

    (if that's not the ordering you want, please post how exactly you want the sample list to be ordered)

Re: sorting an array of file names
by Anonymous Monk on Dec 19, 2009 at 20:15 UTC
    In case you want more than first digit, I've found it useful to pad the numbers
    print "$$_[0]\n" for sort { $$a[1] cmp $$b[1] } map { my $padd = $_; $padd =~ s/(\d+)/sprintf '%06d', $1/ge; [ $_ , $padd ]; } @list;
    you get lexical sort and avoid 1,10,2,3,4...