merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

I want to sort data which has a letter or number of letters at the beginning of each value and then numbers.
I want to do this so that the number part increases by the number size.
Hopefully the example below explains what I mean.
The Perl code is
use strict "vars"; my (@a, @b, $ja); $a[1] = 'E1180'; $a[2] = 'D250'; $a[3] = 'A1180'; $a[4] = 'D130'; $a[5] = 'E855'; $a[6] = 'E975'; $a[7] = 'A130'; $a[8] = 'A250'; $a[9] = 'B1105'; $a[10] = 'B1225'; $a[11] = 'B2480'; $a[12] = 'C1180'; $a[13] = 'C1600'; $a[14] = 'D1180'; $b[1] = 'E1180'; $b[2] = 'D0250'; $b[3] = 'A1180'; $b[4] = 'D0130'; $b[5] = 'E0855'; $b[6] = 'E0975'; $b[7] = 'A0130'; $b[8] = 'A0250'; $b[9] = 'B1105'; $b[10] = 'B1225'; $b[11] = 'B2480'; $b[12] = 'C1180'; $b[13] = 'C1600'; $b[14] = 'D1180'; print "\ncmp sort of \@a\n"; foreach $ja (sort{$a cmp $b} @a) { print "$ja\n"; } print "\ncmp sort of \@b\n"; foreach $ja (sort{$a cmp $b} @b) { print "$ja\n"; }
This code gives the following output.
cmp sort of @a A1180 A130 A250 B1105 B1225 B2480 C1180 C1600 D1180 D130 D250 E1180 E855 E975 cmp sort of @b A0130 A0250 A1180 B1105 B1225 B2480 C1180 C1600 D0130 D0250 D1180 E0855 E0975 E1180
Data in @a is sorted so that the number part is treated a letters. Therefore for the values starting with A, the order of the number part is 1180, 130, 250.
What I really want for these values is the order 130, 250, 1180.
I have kind of achieved this by adding a zero in front of numbers where there value is less than 1000. This is the data in @b
However, the data then becomes A0130, A0250, A1180 which is not ideal.
Is there a way of sorting the data in @a so that I get the order shown for @b but without adding the extra and unwanted zeros?

Replies are listed 'Best First'.
Re: Data with Letter(s) & Number sort query
by Corion (Patriarch) on Nov 19, 2016 at 09:44 UTC

    What you want is a natural sort then. The usual approach is to create for each element to be sorted a string that sorts lexically but from which you can find the original string again. For example, you could build your string by having the padded string, a \0 and then the original string:

    my @elements = map { join "\0", padded($_), $_ } @original; @elements = sort { $a cmp $b } @elements; my @sorted = map { ( split /\0/, $_ )[1] } @elements;

    You can even chain it all together like the following:

    my @elements = map { ( split /\0/, $_ )[1] } sort { $a cmp $b } map { join "\0", padded($_), $_ } @original;
      sort { $a cmp $b }

      Note that the  $a cmp $b "lexicographic ascending" sorting order is the default ordering of sort, so this expression can be replaced with just  sort (i.e, no  { $a cmp $b } code block) wherever it appears here for a significant speed increase for sufficiently large data arrays — but they may have to be quite large!


      Give a man a fish:  <%-{-{-{-<

      Are you importing padded() from a module? I don't immediately see where that comes from.

        No, padded is supposed to be implemented by merrymonk, as I understood it that they had the padding already but wanted to undo it after sorting.

        Yes, same question to Corion, which is why I just added another complete solution below.
Re: Data with Letter(s) & Number sort query
by Laurent_R (Canon) on Nov 19, 2016 at 10:11 UTC
    Using a regex to separate the letters from the digits, and then sorting numerically on the digits, and finally putting back the pieces together:
    $ perl -e 'use strict; > use warnings; > use Data::Dumper; > > my @a = qw / E1180 D250 A1180 E855 E975 A130 A250 B1105 B1255 B2480 +C1180 C1600 D1180 /; > > print "$_\n" > for map "$_->[0]$_->[1]", > sort { $a->[1] <=> $b->[1]} > map { /([A-Z]+)(\d+)/; [$1, $2]} @a;' A130 D250 A250 E855 E975 B1105 E1180 A1180 C1180 D1180 B1255 C1600 B2480
    Note that you did not specify any sort order for the letters when the digits are the same, so the items are kept in the original order in that case (see e.g. A1180, C1180 and D1180).

    Update:: removed a piece a code that got accidentally pasted twice.

      Thank you for the quick suggestions. I too will be interested to hear the answer about padded.
      I want the Letter part to be sorted as normal alphabetical order (possibly from z to a as an option) - also as another option it could be good to ignore the case of the letters.
      The letter part sort comes before the number part.
        If you want to the letters to be a secundary sort key, you can try this:
        $ perl -e 'use strict; > use warnings; > use Data::Dumper; > > my @a = qw / E1180 D250 A1180 E855 E975 A130 A250 B1105 B1255 B2480 +C1180 C1600 D1180 /; > > print "$_\n" > for map "$_->[0]$_->[1]", > sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0]} > map { /([A-Z]+)(\d+)/; [$1, $2]} @a; > ' A130 A250 D250 E855 E975 B1105 A1180 C1180 D1180 E1180 B1255 C1600 B2480
        Notice that A250 and D250 are now sorted alphabetically on the initial letter. For a reverse alphabetical order on the letters, change the sort line to this:
        sort { $a->[1] <=> $b->[1] || $b->[0] cmp $a->[0]}
        I did not have time before to give a complete answer to your questions, but if you want upper and lower case letters and ignore case for the sort:
        $ perl -e 'use strict; > use warnings; > > my @a = qw / d1180 a1180 E1180 D250 A1180 E855 E975 A130 A250 B1105 +b1255 b2480 c1180 c1600 e855 e975 a130 A250 B1105 B1255 B2480/; > > print "$_\n" > for map "$_->[0]$_->[1]", > sort { $a->[1] <=> $b->[1] || uc $a->[0] cmp uc $b->[0]} > map { /([a-zA-Z]+)(\d+)/; [$1, $2]} @a;' A130 a130 A250 A250 D250 E855 e855 E975 e975 B1105 B1105 a1180 A1180 c1180 d1180 E1180 b1255 B1255 c1600 b2480 B2480
        The letter part sort comes before the number part.
        Not sure what you mean.
Re: Data with Letter(s) & Number sort query
by johngg (Canon) on Nov 19, 2016 at 12:06 UTC

    Firstly, note that when initialising your data array, subscripts, as standard, start at zero rather than one. This can be changed but doing so is strongly discouraged.

    Here is a GRT sort that achieves your goal.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' my @data = qw{ E1180 D250 A1180 D130 E855 E975 A130 A250 B1105 B1225 B2480 C1180 C1600 D1180 }; say for map { substr $_, 5 } sort map { pack q{ANA*}, substr( $_, 0, 1 ), substr( $_, 1 ), $_ } @data;' A130 A250 A1180 B1105 B1225 B2480 C1180 C1600 D130 D250 D1180 E855 E975 E1180

    I hope this is useful.

    Cheers,

    JohnGG

Re: Data with Letter(s) & Number sort query
by salva (Canon) on Nov 20, 2016 at 15:04 UTC
    Using Sort::Key:
    use Sort::Key::Multi qw(si_keysort); # "si_" means a string and an integer keys! my @sorted = si_keysort { /^([a-z]+)(\d+))/i } @a;
Re: Data with Letter(s) & Number sort query
by poj (Abbot) on Nov 19, 2016 at 11:54 UTC

    Split the alpha/numeric using a regex and build a hash of arrays using the alpha part as the keys

    #!perl use strict; my @in = qw( CC1180 B130 A250 Z9 B1105 B1225 B2480 C1180 C1600 D1180 D130 D250 e1180 eF855 EF855 Ef855 E975 ERR 123); my %out = (); for (@in){ if (/^([A-Za-z]+)(\d+)$/){ # build hash of arrays # with alpha part uppercase:original as key push @{$out{join ':',uc $1,$1}},$2; } else { warn "Input data error $_"; } } for my $x (sort keys %out){ my @x = split ':',$x; # split uc from original for my $y (sort {$a <=> $b}@{$out{$x}}){ print "$x[1]$y\n"; } }
    poj
    update : moved split
Re: Data with Letter(s) & Number sort query
by tybalt89 (Monsignor) on Nov 19, 2016 at 20:23 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1176132 use strict; use warnings; my @a; $a[1] = 'E1180'; $a[2] = 'D250'; $a[3] = 'A1180'; $a[4] = 'D130'; $a[5] = 'E855'; $a[6] = 'E975'; $a[7] = 'A130'; $a[8] = 'A250'; $a[9] = 'B1105'; $a[10] = 'B1225'; $a[11] = 'B2480'; $a[12] = 'C1180'; $a[13] = 'C1600'; $a[0] = 'D1180'; print "\nsort of \@a\n"; for my $ja (sort{$a =~ tr/0-9//cdr <=> $b =~ tr/0-9//cdr} sort @a) { print "$ja\n"; }

    produces

    sort of @a A130 D130 A250 D250 E855 E975 B1105 A1180 C1180 D1180 E1180 B1225 C1600 B2480

    I haven't seen this version here yet. It sorts first by alpha, then by comparisons after stripping off all non-numbers. Why two sorts? Why not? How does the second (numeric) sort keep from mixing up the first sort? Because recent perl sorts are stable.

    So is it faster or slower than the other sorts given here? Who knows? Who cares? If you really care, then benchmark...

Re: Data with Letter(s) & Number sort query
by AnomalousMonk (Archbishop) on Nov 19, 2016 at 18:40 UTC