Re: numeric sort on substring

Here's a way to do it using split within a Schwartzian Transform:

#!/usr/bin/perl

use strict;
use warnings;

my @data = <DATA>;

# Schwartzian Transform
print map  { $_->[0] }
      sort { $a->[1][1] <=> $b->[1][1] or $a->[1][0] <=> $b->[1][0] }
      map  { [ $_, [ (split m/,/, $_, 3)[0, 1] ] ] }
      @data;

__DATA__
1,64,1.4.5,1.4.6,44642850,44642850,0,27348,10028,59188,1488095,761904.
+64
1,128,1.4.5,1.4.6,25337850,25337850,0,19236,10276,28196,844595,864865.
+28
1,256,1.4.5,1.4.6,13489200,13489200,0,17792,11372,17832,449640,920862.
+72
1,512,1.4.5,1.4.6,6996270,6996270,0,18084,16744,19124,233209,955224.06
+4
1,1024,1.4.5,1.4.6,3557880,3557880,0,31528,20488,35188,118596,971538.4
+32
2,64,1.4.5,1.4.6,44642850,44642850,0,25828,9548,40128,1488095,761904.6
+4
2,128,1.4.5,1.4.6,25337850,25337850,0,27936,10796,28696,844595,864865.
+28
2,256,1.4.5,1.4.6,13489200,13489200,0,12852,10692,13332,449640,920862.
+72
2,512,1.4.5,1.4.6,6996270,6996270,0,17184,15904,18844,233209,955224.06
+4
2,1024,1.4.5,1.4.6,3557880,3557880,0,34068,17948,36628,118596,971538.4
+32
[download]

UPDATE: If you prefer regular expression pattern matching to split-ting in this case, just replace the initial map with this:

      map  { [ $_, [ m/^(\d+),(\d+)/ ] ] }
[download]

Comment on Re: numeric sort on substring Select or Download Code

Replies are listed 'Best First'.
Re^2: numeric sort on substring by johngg (Canon) on Jan 07, 2011 at 09:26 UTC
I'm wondering why you add the complication of an inner anonymous array and a three-argument split. I think neither are necessary and, since `split` defaults to operation on `$_` one argument suffices. `print for map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] \|\| $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_ , ( split m{,} )[ 1, 0 ] ] } <DATA>;` [download] You could also use a Guttman Rosler transform. `print for map { substr $_, 8 } sort map { pack q{NNA*}, ( split m{,} )[ 1, 0 ], $_ } <DATA>;` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: numeric sort on substring by Jim (Curate) on Jan 08, 2011 at 00:14 UTC
In hindsight, the complication of the inner anonymous array is needless. It reflects how my mind reckoned the data structure at the moment I wrote the transform. The three-argument `split` is just a habit. The habit is based on the documentation, which states: "In time critical applications it behooves you not to split into more fields than you really need." I don't know if the OPs application is time-critical or not. I went with the more conservative assumption. Like I said: habit. I like the regular expression pattern matching version better anyway.	[reply] [d/l]
Re^4: numeric sort on substring by johngg (Canon) on Jan 27, 2011 at 13:39 UTC
The three-argument split is just a habit. I finally got around to benchmarking this and it seems to be a habit you should keep :-) ok 1 - grtRegex ok 2 - grtSplit ok 3 - grtSplit3 ok 4 - nSubRegex ok 5 - nSubSplit ok 6 - nSubSplit3 ok 7 - stRegex ok 8 - stSplit ok 9 - stSplit3 Rate nSubSplit nSubRegex nSubSplit3 stSplit grtSplit stSp +lit3 stRegex grtRegex grtSplit3 nSubSplit 8.10/s -- -69% -71% -86% -88% +-93% -93% -94% -94% nSubRegex 25.8/s 219% -- -8% -57% -62% +-77% -77% -82% -82% nSubSplit3 28.1/s 247% 9% -- -53% -59% +-75% -75% -80% -80% stSplit 59.8/s 639% 132% 113% -- -12% +-47% -47% -58% -58% grtSplit 68.1/s 741% 164% 142% 14% -- +-39% -39% -52% -52% stSplit3 112/s 1283% 334% 299% 87% 64% + -- -0% -21% -22% stRegex 112/s 1284% 334% 299% 87% 65% + 0% -- -21% -22% grtRegex 143/s 1661% 452% 408% 138% 109% + 27% 27% -- -0% grtSplit3 143/s 1663% 453% 408% 139% 110% + 28% 27% 0% -- [download] Not constraining the split to just the fields you need (given many fields as here, I'm guessing) is a significant performance hit but it seems that the three-argument `split` is level-pegging with the regular expression approach. The code. Read more... (4 kB) Sorry for the slow reply, I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^5: numeric sort on substring by salva (Canon) on Jan 27, 2011 at 14:48 UTC