Sort problems

erez_ez has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Sort problems by moritz (Cardinal) on Dec 02, 2008 at 09:46 UTC
`($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] \|\| ($a =~ /(\w+)/)[0] cmp ($b =~ /(\w+)/)[0]` [download] Do you actually understand what you write here? You are sorting first on the first number (correct) and then lexicographically on the whole string (`\w+` matches all of your input data entirely). What you need to do is to write down the rules by which to sort. First in plain English, and then translate them into a program. When you do a regex match, put a `print "$1\n"` statement after that to verify that it matched what you think it did.	[reply] [d/l] [select]
Re: Sort problems by salva (Canon) on Dec 02, 2008 at 09:51 UTC
use Sort::Naturally or Sort::Key::Natural.	[reply]
Re: Sort problems by ccn (Vicar) on Dec 02, 2008 at 10:25 UTC
Schwartzian transform `my @array = qw(a1_2 a1_1 a10_10 a2_10 a2_1 a2_2 a10_1 a10_2 a1_10); my @sorted = map { $_->[0] } sort {$a->[1] <=> $b->[1] } map { my $v = $_; tr/a_/0./; [$v, $_] } @array;` [download] Make a float pair for each element of the array. Substitute `'a'` with 0, and `'_'` with floating point '.'. Then sort the pairs by numbers and take original elements from sorted pairs	[reply] [d/l] [select]
Re: Sort problems by cdarke (Prior) on Dec 02, 2008 at 10:10 UTC
Here is my version (not sure what all that split stuff was all about): `use strict; use warnings; sub sortit { my ($alpha_a, $num1_a, $num2_a) = $a =~ /([a-z]+)(\d+)_(\d+)/; my ($alpha_b, $num1_b, $num2_b) = $b =~ /([a-z]+)(\d+)_(\d+)/; my $retn = $alpha_a cmp $alpha_b; return $retn if $retn != 0; $retn = $num1_a <=> $num1_b; return $retn if $retn != 0; return $num2_a <=> $num2_b; } my @list = qw (a1_2 a1_1 a10_10 a2_10 a2_1 a2_2 a10_1 a10_2 a1_10); my @new_list = sort sortit @list; print "@new_list\n";` [download]	[reply] [d/l]
Re^2: Sort problems by moritz (Cardinal) on Dec 02, 2008 at 10:22 UTC
`my $retn = $alpha_a cmp $alpha_b; return $retn if $retn != 0; $retn = $num1_a <=> $num1_b; return $retn if $retn != 0; return $num2_a <=> $num2_b;` [download] That can be compacter written as: `return $alpha_a cmp $alpha_b \|\| $num1_a <=> $num1_b \|\| $num2_a <=> $num2_b;` [download]	[reply] [d/l] [select]
Re: Sort problems by johngg (Canon) on Dec 02, 2008 at 13:56 UTC
You could also try a Guttman Rosler transform. This melds the fields to sort and the whole item into a single string that can be sorted lexically. The original item can then be pulled from the sorted string afterwards. pack or sprintf are often used to construct the string. Care has to be taken to make sure the fields pack to a consistent length across all of the list to be sorted! `use strict; use warnings; my @list = qw{ a1_2 a1_1 a10_10 a2_10 a2_1 a2_2 a10_1 a10_2 a1_10 }; my @sorted = map { substr $_, 8 } sort map { pack q{NNa*}, m{(\d+)_(\d+)}, $_ } @list; print qq{$_\n} for @sorted;` [download] The output. `a1_1 a1_2 a1_10 a2_1 a2_2 a2_10 a10_1 a10_2 a10_10` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Sort problems by AnomalousMonk (Archbishop) on Dec 03, 2008 at 03:08 UTC
`pack q{NNa}, m{(\d+)_(\d+)}, $_`* Note that packing with an 'N' specifier fails if either of the numeric fields exceeds 4294967295. An approach using sprintf (as mentioned in johngg's reply) can get around this. `>perl -wMstrict -le "my @list = qw{ a1_2 a1_1 a10_10 a2_10 a2_1 a2_2 a10_1 a10_2 a1_10 }; use constant WIDTH => 20; my @sorted = map { substr $_, WIDTH * 2 } sort map { sprintf '%04$d%04$d%s', m{(\d+)_(\d+)}, $_, WIDTH } @list; print for @sorted; " a1_1 a1_2 a1_10 a2_1 a2_2 a2_10 a10_1 a10_2 a10_10` [download]	[reply] [d/l] [select]
Re^3: Sort problems by johngg (Canon) on Dec 03, 2008 at 10:22 UTC
Nice ++ I had never seen the `*4$` construct, used to grab arguments by position for printf/sprintf, before. Very useful. Thanks for pointing it out. Cheers, JohnGG	[reply] [d/l]
Re^3: Sort problems by monarch (Priest) on Dec 04, 2008 at 03:38 UTC
Where can I find an explanation for the extra formatting details that resulted in `'%04$d%04$d%s'` (I'd like to learn more about this myself). Has such support been around for a long time in Perl? Is it Perl-specific, or does the GNU C library now support such syntax?	[reply] [d/l]
Re: Sort problems by pobocks (Chaplain) on Dec 02, 2008 at 10:39 UTC
How, with this many solutions, did I immediately gravitate toward the stupidest? My plan was to make a copy of the list, and then pad out all the numbers with leading zeroes to make them compare naturally under regular alphebetization. I felt that it was a stupid plan the moment it came up, but the stupidest? Woe is me... `for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";`	[reply] [d/l]
Re^2: Sort problems by AnomalousMonk (Archbishop) on Dec 02, 2008 at 13:07 UTC
Your "stupid plan" is the heart of the Guttman-Rosler Transform (GRT; see Advanced Sorting - GRT - Guttman Rosler Transform). This approach is based on the fact that using the default lexicographic sorting of sort is faster than using a sort subroutine block, so multi-key GRT sorts will tend to outstrip ST sorts as the array size grows beyond some threshold. Just think, if you had come up with this stupid idea a few years ago, your own name might now be immortalized in the annals of Perl! The original paper is A Fresh Look at Efficient Perl Sorting, also linked by demerphq's link above.	[reply]
Re^3: Sort problems by jdporter (Paladin) on Dec 02, 2008 at 14:58 UTC
Just think, if you had come up with this stupid idea a few years ago, your own name might now be immortalized in the annals of Perl! Maybe, but I'd say not likely. As it happened, the technique was named for the two fellows who extensively analyzed it and wrote a paper about it, but not for the guy who first (as far as can be discerned from the Historical Records) came up with the idea, Michal Rutka. Here's the Usenet message. Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.	[reply]
Re: Sort problems by poolpi (Hermit) on Dec 02, 2008 at 10:45 UTC
`#!/usr/bin/perl use strict; use warnings; my @list = qw(a1_2 a1_1 a10_10 a2_10 a2_1 a2_2 a10_1 a10_2 a1_10); my $re = qr/\w(\d+)_(\d+)/; my @sorted = sort { my @A = $a =~ $re; my @B = $b =~ $re; $A[0] <=> $B[0] or $A[1] <=> $B[1]; } @list;` [download] hth, PooLpi 'Ebry haffa hoe hab im tik a bush'. Jamaican proverb	[reply] [d/l]
Re: Sort problems by JavaFan (Canon) on Dec 02, 2008 at 10:10 UTC
Seems to me you first want to sort on the leading letters, if they are equal, numerically on the first number, and if that's equal again, numerically on the second number. So your first step would be to figure out how to split the elements in their parts, then do the three comparisons. Afterwards, you can optimize using an ST or a GRT.	[reply]