Re: Custom Sort Array
by arkturuz (Curate) on Aug 10, 2013 at 19:39 UTC
|
Oh, and you can declare your variable in loop. No need for my $file = $_ later:
sub by_id {
# extract numbers
my ($A) = $a =~ /(\d+)/;
my ($B) = $b =~ /(\d+)/;
# return comparison
return $A <=> $B;
}
for my $f (sort by_id @files) {
say $f;
}
| [reply] [d/l] [select] |
|
chomp(my @files = <DATA>); ## added
sub by_id {
# extract numbers
my ($A) = $a =~ /.+?-(\d+)\..+$/; # changed
my ($B) = $b =~ /.+?-(\d+)\..+$/; # changed
# return comparison
return $A <=> $B;
}
for my $f (sort by_id @files) {
print $f,$/;
}
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [d/l] |
|
Thanks, however this seems to return the same array I had previously. 10 is still preceded by 2.
I should clarify that the logs have numbers prior as well, so the log looks something like this:
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
| [reply] |
|
You changed the requirements but I think you see where is the solution going: extract relevant fields and compare that.
| [reply] |
|
my ($A) = $a =~ /(\d+)\.gz$/;
my ($B) = $b =~ /(\d+)\.gz$/;
| [reply] [d/l] |
Re: Custom Sort Array
by 2teez (Vicar) on Aug 10, 2013 at 20:00 UTC
|
use strict;
use warnings;
print join $/ => map{$_->[0]}
sort{$a->[1] <=> $b->[1]}
map{[$_,/.+?-(\d+)\..+$/]}<DATA>;
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
Produces...
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [d/l] [select] |
|
Yes, I believe that is the concept I was looking for, but your example I am still only getting the same array I got before. Am I missing something simplistic? Sorry also, there is a 1, it doesn't just go 0,2,etc. There's a 1, so the 1 shows up before the 10, and 11, then the 2 shows up. This goes well beyond just 1, however. Logs can go as far as 99, so I need a solution that will organize them up to the point of this final -##.gz in the name from 0-99+ in numeric order
foreach (@files) {
my $file = $_;
print join $/ => map{$_->[0]}
sort{$a->[1] <=> $b->[1]}
map{[$_,/.+?-(\d+)\..+$/]}$file;
print "\n";
| [reply] [d/l] |
|
...There's a 1, so the 1 shows up before the 10, and 11, then the 2 shows up. This goes well beyond just 1, however. Logs can go as far as 99, so I need a solution that will organize them up to the point of this final -##.gz in the name from 0-99+ in numeric order.
Yes, even with the 1 file been included. It would sort it for you. I think you are getting it wrong the way you apply the solution. You might want to do it like this:
use strict;
use warnings;
my @files = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, /.+?-(\d+)\..+$/ ] } <DATA>;
print $_,$/ for @files;
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
you will have:
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [d/l] [select] |
|
|
|
Re: Custom Sort Array
by Laurent_R (Canon) on Aug 10, 2013 at 21:36 UTC
|
You are not saying everything at once, this is the cause of misunderstandings.
You only need to amend slightly the Schwartzian Transform solution by 2teez to accomodate the new rules.
use strict;
use warnings;
my @files = map { $_->[0] }
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]}
map { [ $_, /\.(\d{12})\-/, /.+?-(\d+)\..+$/ ] } <DATA>;
print $_,$/ for @files;
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz
I guess that's what you need.
Update 21:42 UTC: 2teez was faster than me by about 8 minutes producing a new solution in accordance with the new rules.
Update 2 22:15 UTC: you had a third additional requirement (also use the 4-digit group after the date for the sort) that I had not seen. I've just posted a newly amended version in my answer below. | [reply] [d/l] [select] |
|
I had not read carefully enough your post where you were also asking the files to be sorted according to the group of four digits after the date. It would be far better if you gave all your sorting rules from the outset.
This a new amendment to take this rule into account:
use strict;
use warnings;
my @files = map { $_->[0] }
sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] }
map { [ $_, /\.(\d{12}\-\d{4})/, /.+?-(\d+)\..+$/ ] } <DAT
+A>;
print $_,$/ for @files;
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz
abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz
abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz
abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz
abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz
| [reply] [d/l] [select] |
|
This is exactly what I was looking for, thank you!
| [reply] |
Re: Custom Sort Array
by Anonymous Monk on Aug 11, 2013 at 17:17 UTC
|
Sort::Key::Natural perhaps?
Anyway, here's my try. I did not check thoroughly, but it seems to work.
sub natural_sort {
my $idx = shift || 0;
return -1 if !defined $a->[$idx];
return 1 if !defined $b->[$idx];
if ($a->[$idx] =~ /^[0-9]/ and $b->[$idx] =~ /^[0-9]/) {
$a->[$idx] <=> $b->[$idx] or natural_sort($idx + 1);
} else {
$a->[$idx] cmp $b->[$idx] or natural_sort($idx + 1);
}
}
my @files = <DATA>;
print for
map { join "", @$_ }
sort natural_sort
map { [ split /(\d+)/, $_ ] } @files;
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz
abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz
abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz
abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz
abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz
| [reply] [d/l] |
Re: Custom Sort Array
by poj (Abbot) on Aug 11, 2013 at 07:04 UTC
|
If you have a maximum sequence number of -99 then try this ;
#!perl
use strict;
chomp( my @files = <DATA>) ;
# add leading zero for 0 to 9
s/-(\d)(\.gz)$/-0$1$2/ for @files;
# sort
my @sorted = sort @files;
# remove leading zero for 00 to 09
s/-0(\d)(\.gz)$/-$1$2/ for @sorted;
print "$_\n" for @sorted;
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201307280800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307280800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307280800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307280800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307280800-0900-3.gz
poj | [reply] [d/l] [select] |
|
poj's idea of changing the words to enable lexicographical sort can be expressed as a variant of the Schwartzian Transform (ST) called the Guttman Rosler Transform (GRT), which is deemed to be faster than the ST because the sort phase is entirely C code and it avoids one level of indirection. The idea of poj expressed with the GRT construct:
use strict;
use warnings;
my @files = map {s/-0(\d)(\.gz)$/-$1$2/; $_}
sort
map {s/-(\d)(\.gz)$/-0$1$2/; $_} <DATA>;
print $_ for @files;
__DATA__
abcd1_abc_123456.abc1a_A.201307290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201307290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-0.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-1.gz
abcd1_abc_123456.abc1a_A.201306290800-0900-10.gz
abcd1_abc_123456.abc1a_A.201305290800-0900-11.gz
abcd1_abc_123456.abc1a_A.201308290800-0900-2.gz
abcd1_abc_123456.abc1a_A.201302290800-0900-3.gz
abcd1_abc_123456.abc1a_A.201306290800-1000-1.gz
abcd1_abc_123456.abc1a_A.201306290800-1000-10.gz
abcd1_abc_123456.abc1a_A.201305290800-1000-11.gz
abcd1_abc_123456.abc1a_A.201308290800-1000-2.gz
abcd1_abc_123456.abc1a_A.201302290800-1000-3.gz
For more information on the GRT, see e.g. Advanced Sorting - GRT - Guttman Rosler Transform (this is just one I picked up at random, you have several links in this post to other nodes, and there are many other nodes on the subject)
| [reply] [d/l] [select] |