erez_ez has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I know I already nagged you all about this sort issue but I just cant get it to work and I think I'm starting to lose it... Basically, I need to sort a long list both alphabetically and numerically. The size of each word is not given and there is not a constant syntax. The only thing that does repeat itself is the brackets at the end of each word. For examle this is how the following list should look like after sorting:
a1_1[1] a1_1[2] a1_1[10] a1_2[1] a1_2[2] a1_2[10] a1_10[1] a1_10[2] a1 +_10[10] a2_1[1] a2_1[2] a2_1[10] a2_2[1] a2_2[2] a2_2[10] a2_10[1] a2_10[2] a2 +_10[10] a10_1[1] a10_1[2] a10_1[10] a10_2[1] a10_2[2] a10_2[10] a10_10[1] a10_ +10[2] a10_10[10] b1_1[1] b1_1[2] b1_1[10] b1_2[1] b1_2[2] b1_2[10] b1_10[1] b1_10[2] b1 +_10[10] b2_1[1] b2_1[2] b2_1[10] b2_2[1] b2_2[2] b2_2[10] b2_10[1] b2_10[2] b2 +_10[10] b10_1[1] b10_1[2] b10_1[10] b10_2[1] b10_2[2] b10_2[10] b10_10[1] b10_ +10[2] b10_10[10]
Again, this is not necessarily the structure of each word. For example, it can be also abc_25_pur5llt3. I already tried using the following code(in various versions) but it works on certain lists(such as the one I showed above) and doesnt on others:
my @new_list = sort { ($a =~ /(\w+)(\w+)(\d+)/)[0] cmp ($b =~ /(\w+)(\w+)(\d+)/)[0] || ($a =~ /\[(\d+)/)[0] <=> ($b =~ /\[(\d+)/)[0] } @split_list;
I would accept any code that will make the job done. I just cant make mine to work probably because I cant totally understand it(I got it from one of the Monks). Thank you!

Replies are listed 'Best First'.
Re: Sort problems
by Corion (Patriarch) on Dec 11, 2008 at 14:49 UTC
Re: Sort problems
by ccn (Vicar) on Dec 11, 2008 at 14:47 UTC
    Schwartzian transform
    my @new_list = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { /(a|b)(\d+)_(\d+)\[(\d+)\]/ or die "Can't par +se $_"; [$_, sprintf "$1%05d%05d%05d", $2, $3, $4]; } @split_list;

    Make a sortable pair for each element of the array. Concatenate first letter with all the numbers in an element. Expand every number to 5 digits with leading zeros. Then sort the pairs alphabetically by second element and take original elements from sorted pairs

      More robust:

      my @new_list = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { ( my $s = $_ ) =~ s/(\d+)/0$1/g; [ $_, $s ] } @split_list;
        Thanks, but still it didnt do it right. Take this list for example, the result:
        buff1_100u[1] buff1_12p5u[0] buff1_25u[6] buff1_50u[4] buff2_100u[3] b +uff2_12p5u[2] buff2_25u[3] buff2_50u[1]
        buff1_100u1 cant come before buff1_25u6...
      first of all, Thanks! It looks like he cant parse certain words. One example: buff1_12p5u[0]
Re: Sort problems
by Herkum (Parson) on Dec 11, 2008 at 14:59 UTC

    It might help if you focus on how you are breaking about your words instead of the sort( which you seem to understand). I suggest you take your list of "words" and put them into a hash to see what you are getting in your pattern match.

    my %parsed_list_for; for my $word (@list) { my $word =~ /(\w+)(\w+)(\d+)/; $parsed_list_for{ $word }{ 'alpha' } = $1; my $word =~ /\[(\d+)/; $parsed_list_for{ $word }{ 'numeric'} = $1; } # use Data::Dumper to dump your hash and see your results

    That may go a long way to help solve your sorting problem.

Re: Sort problems
by GrandFather (Saint) on Dec 11, 2008 at 22:08 UTC

    Finding a good way to normalize the string being sorted makes things easier:

    use strict; use warnings; my @values = map {chomp; /(\S+)/g} <DATA>; my @sorted = map {tr/\0//d; $_} sort map {normalize ($_)} @values; (my $str = "@sorted") =~ s/(.{5,70})\s/$1\n/g; print $str; sub normalize { my $str = shift; my @parts = map {/\D/ ? $_ : ("\0" x (10 - length $_)) . $_} split /(?<=\d)(?=\D)|(?<=\D)(?=\d)/, $str; return join '', @parts; } __DATA__ a10_10[10] b10_10[10] b2_1[1] b2_1[2] b2_1[10] b2_2[1] b2_2[2] b2_2[10] b2_10[1] b2_10[2] b2 +_10[10] a1_1[1] a1_1[2] a1_1[10] a1_2[1] a1_2[2] a1_2[10] a1_10[1] a1_10[2] a1 +_10[10] a10_1[1] a10_1[2] a10_1[10] a10_2[1] a10_2[2] a10_2[10] a10_10[1] a10_ +10[2] b1_1[1] b1_1[2] b1_1[10] b1_2[1] b1_2[2] b1_2[10] b1_10[1] b1_10[2] b1 +_10[10] b10_1[1] b10_1[2] b10_1[10] b10_2[1] b10_2[2] b10_2[10] b10_10[1] b10_ +10[2] a2_1[1] a2_1[2] a2_1[10] a2_2[1] a2_2[2] a2_2[10] a2_10[1] a2_10[2] a2 +_10[10]

    Prints:

    a1_1[1] a1_1[2] a1_1[10] a1_2[1] a1_2[2] a1_2[10] a1_10[1] a1_10[2] a1_10[10] a2_1[1] a2_1[2] a2_1[10] a2_2[1] a2_2[2] a2_2[10] a2_10[1] a2_10[2] a2_10[10] a10_1[1] a10_1[2] a10_1[10] a10_2[1] a10_2[2] a10_2[10] a10_10[1] a10_10[2] a10_10[10] b1_1[1] b1_1[2] b1_1[10] b1_2[1] b1_2[2] b1_2[10] b1_10[1] b1_10[2] b1_10[10] b2_1[1] b2_1[2] b2_1[10] b2_2[1] b2_2[2] b2_2[10] b2_10[1] b2_10[2] b2_10[10] b10_1[1] b10_1[2] b10_1[10] b10_2[1] b10_2[2] b10_2[10] b10_10[1] b10_10[2] b10_10[10]

    Perl's payment curve coincides with its learning curve.
Re: Sort problems
by hangon (Deacon) on Dec 12, 2008 at 04:50 UTC

    ... sort a long list both alphabetically and numerically. The size of each word is not given and there is not a constant syntax ... The only thing that does repeat itself is the brackets at the end of each word. For example ... a1_1[1] a1_1[2] a1_1[10] a1_2[1] a1_2[2]

    ... For example, it can be also abc_25_pur5llt[3]

    ... looks like he cant parse certain words... buff1_12p5u[0]

    buff1_100u1 cant come before buff1_25u6

    The problem is that you want to combine different types of sorting on different parts of a string, when the format of that string can vary. Until you can clearly define what you need to do, you will not be able to code it. So stop writing code and take a hard look at look at your data. Write a specification that defines the format of the *words* in all cases. From there write a set of rules to define how you want the data sorted.

    This exercise should give you the clarity to either write the sorting algorithm you need, or to give us a better explanation of what you're looking for.