numeric sort on substring

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to sort on the first two columns of data using the sub below:


sub RowSort {

    my($aa) = $a =~ /(\d+)\,(\d+)/;         
    my($bb) = $b =~ /(\d+)\,(\d+)/;         
        
    $aa <=> $bb;       
}
[download]

Here is my current data:

1,64,1.4.5,1.4.6,44642850,44642850,0,27348,10028,59188,1488095,761904.
+64
1,128,1.4.5,1.4.6,25337850,25337850,0,19236,10276,28196,844595,864865.
+28
1,256,1.4.5,1.4.6,13489200,13489200,0,17792,11372,17832,449640,920862.
+72
1,512,1.4.5,1.4.6,6996270,6996270,0,18084,16744,19124,233209,955224.06
+4
1,1024,1.4.5,1.4.6,3557880,3557880,0,31528,20488,35188,118596,971538.4
+32
2,64,1.4.5,1.4.6,44642850,44642850,0,25828,9548,40128,1488095,761904.6
+4
2,128,1.4.5,1.4.6,25337850,25337850,0,27936,10796,28696,844595,864865.
+28
2,256,1.4.5,1.4.6,13489200,13489200,0,12852,10692,13332,449640,920862.
+72
2,512,1.4.5,1.4.6,6996270,6996270,0,17184,15904,18844,233209,955224.06
+4
2,1024,1.4.5,1.4.6,3557880,3557880,0,34068,17948,36628,118596,971538.4
+32
[download]

And here is my expected output:

# result should be sorted:
1,64,1.4.5,1.4.6,44642850,44642850,0,27348,10028,59188,1488095,761904.
+64
2,64,1.4.5,1.4.6,44642850,44642850,0,25828,9548,40128,1488095,761904.6
+4
1,128,1.4.5,1.4.6,25337850,25337850,0,19236,10276,28196,844595,864865.
+28
2,128,1.4.5,1.4.6,25337850,25337850,0,27936,10796,28696,844595,864865.
+28
1,256,1.4.5,1.4.6,13489200,13489200,0,17792,11372,17832,449640,920862.
+72
2,256,1.4.5,1.4.6,13489200,13489200,0,12852,10692,13332,449640,920862.
+72
1,512,1.4.5,1.4.6,6996270,6996270,0,18084,16744,19124,233209,955224.06
+4
2,512,1.4.5,1.4.6,6996270,6996270,0,17184,15904,18844,233209,955224.06
+4
1,1024,1.4.5,1.4.6,3557880,3557880,0,31528,20488,35188,118596,971538.4
+32
2,1024,1.4.5,1.4.6,3557880,3557880,0,34068,17948,36628,118596,971538.4
+32
[download]

Any thougts?

Comment on numeric sort on substring Select or Download Code

Replies are listed 'Best First'.
Re: numeric sort on substring by kennethk (Abbot) on Jan 06, 2011 at 16:36 UTC
The issue with your regular expression is that you are capturing the first number, not the second, into your buffer. You could get your expected result modifying your regular expression to not capture the first digits: #!/usr/bin/perl use strict; use warnings; my @data = grep $_, <DATA>; print sort RowSort @data; sub RowSort { my($aa) = $a =~ /\d+,(\d+)/; my($bb) = $b =~ /\d+,(\d+)/; $aa <=> $bb; } __DATA__ 1,64,1.4.5,1.4.6,44642850,44642850,0,27348,10028,59188,1488095,761904. +64 1,128,1.4.5,1.4.6,25337850,25337850,0,19236,10276,28196,844595,864865. +28 1,256,1.4.5,1.4.6,13489200,13489200,0,17792,11372,17832,449640,920862. +72 1,512,1.4.5,1.4.6,6996270,6996270,0,18084,16744,19124,233209,955224.06 +4 1,1024,1.4.5,1.4.6,3557880,3557880,0,31528,20488,35188,118596,971538.4 +32 2,64,1.4.5,1.4.6,44642850,44642850,0,25828,9548,40128,1488095,761904.6 +4 2,128,1.4.5,1.4.6,25337850,25337850,0,27936,10796,28696,844595,864865. +28 2,256,1.4.5,1.4.6,13489200,13489200,0,12852,10692,13332,449640,920862. +72 2,512,1.4.5,1.4.6,6996270,6996270,0,17184,15904,18844,233209,955224.06 +4 2,1024,1.4.5,1.4.6,3557880,3557880,0,34068,17948,36628,118596,971538.4 +32 [download] Using YAPE::Regex::Explain to parse the regex: The regular expression: (?-imsx:\d+,(\d+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- , ',' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] See perlretut.	[reply] [d/l] [select]
Re^2: numeric sort on substring by mikeraz (Friar) on Jan 06, 2011 at 17:40 UTC
Or do you need: `sub RowSort { my ($a1, $a2) = $a =~ /(\d+)\,(\d+)/; my ($b2, $b2) = $b =~ /(\d+)\,(\d+)/; my $am = $a1.$a2; my $bm = $b1.$b2; $am <=> $bm; }` [download] Be Appropriate && Follow Your Curiosity	[reply] [d/l]
Re: numeric sort on substring by Anonyrnous Monk (Hermit) on Jan 06, 2011 at 16:41 UTC
I'm trying to sort on the first two columns `sub RowSort { my($a1, $a2) = $a =~ /(\d+),(\d+)/; my($b1, $b2) = $b =~ /(\d+),(\d+)/; $a2 <=> $b2 or $a1 <=> $b1; }` [download] Chaining comparisons with `'or'` has the effect that if the first one says 'equal' (`<=>` yields `0`), the next comparison is being tested, etc.	[reply] [d/l] [select]
Re: numeric sort on substring by moritz (Cardinal) on Jan 06, 2011 at 16:58 UTC
Maybe you want something like `(split /,/, $a)[1] <=> (split /,/, $b)[1]` for comparison. Perl 6 - second systems done right	[reply] [d/l]
Re^2: numeric sort on substring by Anonyrnous Monk (Hermit) on Jan 06, 2011 at 17:10 UTC
As that would sort by the second column only, it would fail to yield the desired output in case the input was sorted differently. For example, if all the rows with "2" in the first column came first in the input, the output would be `2,64,1.4.5,1.4.6,44642850,44642850,0,25828,9548,40128,1488095,761904.6 +4 1,64,1.4.5,1.4.6,44642850,44642850,0,27348,10028,59188,1488095,761904. +64 2,128,1.4.5,1.4.6,25337850,25337850,0,27936,10796,28696,844595,864865. +28 1,128,1.4.5,1.4.6,25337850,25337850,0,19236,10276,28196,844595,864865. +28 ...` [download]	[reply] [d/l]
Re: numeric sort on substring by Jim (Curate) on Jan 07, 2011 at 00:24 UTC
Here's a way to do it using `split` within a Schwartzian Transform: #!/usr/bin/perl use strict; use warnings; my @data = <DATA>; # Schwartzian Transform print map { $_->[0] } sort { $a->[1][1] <=> $b->[1][1] or $a->[1][0] <=> $b->[1][0] } map { [ $_, [ (split m/,/, $_, 3)[0, 1] ] ] } @data; __DATA__ 1,64,1.4.5,1.4.6,44642850,44642850,0,27348,10028,59188,1488095,761904. +64 1,128,1.4.5,1.4.6,25337850,25337850,0,19236,10276,28196,844595,864865. +28 1,256,1.4.5,1.4.6,13489200,13489200,0,17792,11372,17832,449640,920862. +72 1,512,1.4.5,1.4.6,6996270,6996270,0,18084,16744,19124,233209,955224.06 +4 1,1024,1.4.5,1.4.6,3557880,3557880,0,31528,20488,35188,118596,971538.4 +32 2,64,1.4.5,1.4.6,44642850,44642850,0,25828,9548,40128,1488095,761904.6 +4 2,128,1.4.5,1.4.6,25337850,25337850,0,27936,10796,28696,844595,864865. +28 2,256,1.4.5,1.4.6,13489200,13489200,0,12852,10692,13332,449640,920862. +72 2,512,1.4.5,1.4.6,6996270,6996270,0,17184,15904,18844,233209,955224.06 +4 2,1024,1.4.5,1.4.6,3557880,3557880,0,34068,17948,36628,118596,971538.4 +32 [download] UPDATE: If you prefer regular expression pattern matching to `split`-ting in this case, just replace the initial `map` with this: `map { [ $_, [ m/^(\d+),(\d+)/ ] ] }` [download]	[reply] [d/l] [select]
Re^2: numeric sort on substring by johngg (Canon) on Jan 07, 2011 at 09:26 UTC
I'm wondering why you add the complication of an inner anonymous array and a three-argument split. I think neither are necessary and, since `split` defaults to operation on `$_` one argument suffices. `print for map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] \|\| $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_ , ( split m{,} )[ 1, 0 ] ] } <DATA>;` [download] You could also use a Guttman Rosler transform. `print for map { substr $_, 8 } sort map { pack q{NNA*}, ( split m{,} )[ 1, 0 ], $_ } <DATA>;` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: numeric sort on substring by Jim (Curate) on Jan 08, 2011 at 00:14 UTC
In hindsight, the complication of the inner anonymous array is needless. It reflects how my mind reckoned the data structure at the moment I wrote the transform. The three-argument `split` is just a habit. The habit is based on the documentation, which states: "In time critical applications it behooves you not to split into more fields than you really need." I don't know if the OPs application is time-critical or not. I went with the more conservative assumption. Like I said: habit. I like the regular expression pattern matching version better anyway.	[reply] [d/l]
Re^4: numeric sort on substring by johngg (Canon) on Jan 27, 2011 at 13:39 UTC
Re^5: numeric sort on substring by salva (Canon) on Jan 27, 2011 at 14:48 UTC