Custom Sort Array

omegaweaponZ has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Custom Sort Array by 2teez (Vicar) on Aug 10, 2013 at 20:00 UTC
What about trying Schwartzian transform: `use strict; use warnings; print join $/ => map{$_->[0]} sort{$a->[1] <=> $b->[1]} map{[$_,/.+?-(\d+)\..+$/]}<DATA>; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz` [download] Produces... `abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz` [download] If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l] [select]
Re^2: Custom Sort Array by omegaweaponZ (Beadle) on Aug 10, 2013 at 20:10 UTC
Yes, I believe that is the concept I was looking for, but your example I am still only getting the same array I got before. Am I missing something simplistic? Sorry also, there is a 1, it doesn't just go 0,2,etc. There's a 1, so the 1 shows up before the 10, and 11, then the 2 shows up. This goes well beyond just 1, however. Logs can go as far as 99, so I need a solution that will organize them up to the point of this final -##.gz in the name from 0-99+ in numeric order `foreach (@files) { my $file = $_; print join $/ => map{$_->[0]} sort{$a->[1] <=> $b->[1]} map{[$_,/.+?-(\d+)\..+$/]}$file; print "\n";` [download]	[reply] [d/l]
Re^3: Custom Sort Array by 2teez (Vicar) on Aug 10, 2013 at 20:25 UTC
...There's a 1, so the 1 shows up before the 10, and 11, then the 2 shows up. This goes well beyond just 1, however. Logs can go as far as 99, so I need a solution that will organize them up to the point of this final -##.gz in the name from 0-99+ in numeric order. Yes, even with the 1 file been included. It would sort it for you. I think you are getting it wrong the way you apply the solution. You might want to do it like this: `use strict; use warnings; my @files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, /.+?-(\d+)\..+$/ ] } <DATA>; print $_,$/ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz` [download] you will have: `abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz` [download] If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l] [select]
Re^4: Custom Sort Array by omegaweaponZ (Beadle) on Aug 10, 2013 at 20:41 UTC
Re^5: Custom Sort Array by AnomalousMonk (Archbishop) on Aug 10, 2013 at 21:39 UTC
Re^5: Custom Sort Array by 2teez (Vicar) on Aug 10, 2013 at 21:28 UTC
Re: Custom Sort Array by arkturuz (Curate) on Aug 10, 2013 at 19:39 UTC
Oh, and you can declare your variable in loop. No need for `my $file = $_` later: `sub by_id { # extract numbers my ($A) = $a =~ /(\d+)/; my ($B) = $b =~ /(\d+)/; # return comparison return $A <=> $B; } for my $f (sort by_id @files) { say $f; }` [download]	[reply] [d/l] [select]
Re^2: Custom Sort Array by 2teez (Vicar) on Aug 10, 2013 at 20:42 UTC
Hi, arkturuz, will even work if the regex matches the values to be sorted like this: chomp(my @files = <DATA>); ## added sub by_id { # extract numbers my ($A) = $a =~ /.+?-(\d+)\..+$/; # changed my ($B) = $b =~ /.+?-(\d+)\..+$/; # changed # return comparison return $A <=> $B; } for my $f (sort by_id @files) { print $f,$/; } __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz [download] If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l]
Re^2: Custom Sort Array by omegaweaponZ (Beadle) on Aug 10, 2013 at 19:43 UTC
Thanks, however this seems to return the same array I had previously. 10 is still preceded by 2. I should clarify that the logs have numbers prior as well, so the log looks something like this: abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz	[reply]
Re^3: Custom Sort Array by arkturuz (Curate) on Aug 10, 2013 at 20:44 UTC
You changed the requirements but I think you see where is the solution going: extract relevant fields and compare that.	[reply]
Re^2: Custom Sort Array by Anonymous Monk on Aug 12, 2013 at 16:46 UTC
I think you meant `my ($A) = $a =~ /(\d+)\.gz$/; my ($B) = $b =~ /(\d+)\.gz$/;` [download]	[reply] [d/l]
Re: Custom Sort Array by Laurent_R (Canon) on Aug 10, 2013 at 21:36 UTC
You are not saying everything at once, this is the cause of misunderstandings. You only need to amend slightly the Schwartzian Transform solution by 2teez to accomodate the new rules. use strict; use warnings; my @files = map { $_->[0] } sort { $a->[1] <=> $b->[1] \|\| $a->[2] <=> $b->[2]} map { [ $_, /\.(\d{12})\-/, /.+?-(\d+)\..+$/ ] } <DATA>; print $_,$/ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz [download] Read more... (822 Bytes) I guess that's what you need. Update 21:42 UTC: 2teez was faster than me by about 8 minutes producing a new solution in accordance with the new rules. Update 2 22:15 UTC: you had a third additional requirement (also use the 4-digit group after the date for the sort) that I had not seen. I've just posted a newly amended version in my answer below.	[reply] [d/l] [select]
Re^2: Custom Sort Array by Laurent_R (Canon) on Aug 10, 2013 at 22:14 UTC
I had not read carefully enough your post where you were also asking the files to be sorted according to the group of four digits after the date. It would be far better if you gave all your sorting rules from the outset. This a new amendment to take this rule into account: use strict; use warnings; my @files = map { $_->[0] } sort { $a->[1] cmp $b->[1] \|\| $a->[2] <=> $b->[2] } map { [ $_, /\.(\d{12}\-\d{4})/, /.+?-(\d+)\..+$/ ] } <DAT +A>; print $_,$/ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz [download] Read more... (1062 Bytes)	[reply] [d/l] [select]
Re^3: Custom Sort Array by omegaweaponZ (Beadle) on Aug 13, 2013 at 13:02 UTC
This is exactly what I was looking for, thank you!	[reply]
Re: Custom Sort Array by Anonymous Monk on Aug 11, 2013 at 17:17 UTC
Sort::Key::Natural perhaps? Anyway, here's my try. I did not check thoroughly, but it seems to work. sub natural_sort { my $idx = shift \|\| 0; return -1 if !defined $a->[$idx]; return 1 if !defined $b->[$idx]; if ($a->[$idx] =~ /^[0-9]/ and $b->[$idx] =~ /^[0-9]/) { $a->[$idx] <=> $b->[$idx] or natural_sort($idx + 1); } else { $a->[$idx] cmp $b->[$idx] or natural_sort($idx + 1); } } my @files = <DATA>; print for map { join "", @$_ } sort natural_sort map { [ split /(\d+)/, $_ ] } @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz [download]	[reply] [d/l]
Re: Custom Sort Array by poj (Abbot) on Aug 11, 2013 at 07:04 UTC
If you have a maximum sequence number of `-99` then try this ; #!perl use strict; chomp( my @files = <DATA>) ; # add leading zero for 0 to 9 s/-(\d)(\.gz)$/-0$1$2/ for @files; # sort my @sorted = sort @files; # remove leading zero for 00 to 09 s/-0(\d)(\.gz)$/-$1$2/ for @sorted; print "$_\n" for @sorted; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201307280800-0900-0.gz abcd1_abc_123456.abc1a_A.201307280800-0900-10.gz abcd1_abc_123456.abc1a_A.201307280800-0900-11.gz abcd1_abc_123456.abc1a_A.201307280800-0900-2.gz abcd1_abc_123456.abc1a_A.201307280800-0900-3.gz [download] poj	[reply] [d/l] [select]
Re^2: Custom Sort Array by Laurent_R (Canon) on Aug 11, 2013 at 09:05 UTC
poj's idea of changing the words to enable lexicographical sort can be expressed as a variant of the Schwartzian Transform (ST) called the Guttman Rosler Transform (GRT), which is deemed to be faster than the ST because the sort phase is entirely C code and it avoids one level of indirection. The idea of poj expressed with the GRT construct: use strict; use warnings; my @files = map {s/-0(\d)(\.gz)$/-$1$2/; $_} sort map {s/-(\d)(\.gz)$/-0$1$2/; $_} <DATA>; print $_ for @files; __DATA__ abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz [download] Read more... (1080 Bytes) For more information on the GRT, see e.g. Advanced Sorting - GRT - Guttman Rosler Transform (this is just one I picked up at random, you have several links in this post to other nodes, and there are many other nodes on the subject)	[reply] [d/l] [select]