in reply to numeric sort on substring

Here's a way to do it using split within a Schwartzian Transform:

#!/usr/bin/perl use strict; use warnings; my @data = <DATA>; # Schwartzian Transform print map { $_->[0] } sort { $a->[1][1] <=> $b->[1][1] or $a->[1][0] <=> $b->[1][0] } map { [ $_, [ (split m/,/, $_, 3)[0, 1] ] ] } @data; __DATA__ 1,64,1.4.5,1.4.6,44642850,44642850,0,27348,10028,59188,1488095,761904. +64 1,128,1.4.5,1.4.6,25337850,25337850,0,19236,10276,28196,844595,864865. +28 1,256,1.4.5,1.4.6,13489200,13489200,0,17792,11372,17832,449640,920862. +72 1,512,1.4.5,1.4.6,6996270,6996270,0,18084,16744,19124,233209,955224.06 +4 1,1024,1.4.5,1.4.6,3557880,3557880,0,31528,20488,35188,118596,971538.4 +32 2,64,1.4.5,1.4.6,44642850,44642850,0,25828,9548,40128,1488095,761904.6 +4 2,128,1.4.5,1.4.6,25337850,25337850,0,27936,10796,28696,844595,864865. +28 2,256,1.4.5,1.4.6,13489200,13489200,0,12852,10692,13332,449640,920862. +72 2,512,1.4.5,1.4.6,6996270,6996270,0,17184,15904,18844,233209,955224.06 +4 2,1024,1.4.5,1.4.6,3557880,3557880,0,34068,17948,36628,118596,971538.4 +32

UPDATE: If you prefer regular expression pattern matching to split-ting in this case, just replace the initial map with this:

map { [ $_, [ m/^(\d+),(\d+)/ ] ] }

Replies are listed 'Best First'.
Re^2: numeric sort on substring
by johngg (Canon) on Jan 07, 2011 at 09:26 UTC

    I'm wondering why you add the complication of an inner anonymous array and a three-argument split. I think neither are necessary and, since split defaults to operation on $_ one argument suffices.

    print for map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_ , ( split m{,} )[ 1, 0 ] ] } <DATA>;

    You could also use a Guttman Rosler transform.

    print for map { substr $_, 8 } sort map { pack q{NNA*}, ( split m{,} )[ 1, 0 ], $_ } <DATA>;

    I hope this is of interest.

    Cheers,

    JohnGG

      In hindsight, the complication of the inner anonymous array is needless. It reflects how my mind reckoned the data structure at the moment I wrote the transform.

      The three-argument split is just a habit. The habit is based on the documentation, which states: "In time critical applications it behooves you not to split into more fields than you really need." I don't know if the OPs application is time-critical or not. I went with the more conservative assumption. Like I said: habit.

      I like the regular expression pattern matching version better anyway.

        The three-argument split is just a habit.

        I finally got around to benchmarking this and it seems to be a habit you should keep :-)

        ok 1 - grtRegex ok 2 - grtSplit ok 3 - grtSplit3 ok 4 - nSubRegex ok 5 - nSubSplit ok 6 - nSubSplit3 ok 7 - stRegex ok 8 - stSplit ok 9 - stSplit3 Rate nSubSplit nSubRegex nSubSplit3 stSplit grtSplit stSp +lit3 stRegex grtRegex grtSplit3 nSubSplit 8.10/s -- -69% -71% -86% -88% +-93% -93% -94% -94% nSubRegex 25.8/s 219% -- -8% -57% -62% +-77% -77% -82% -82% nSubSplit3 28.1/s 247% 9% -- -53% -59% +-75% -75% -80% -80% stSplit 59.8/s 639% 132% 113% -- -12% +-47% -47% -58% -58% grtSplit 68.1/s 741% 164% 142% 14% -- +-39% -39% -52% -52% stSplit3 112/s 1283% 334% 299% 87% 64% + -- -0% -21% -22% stRegex 112/s 1284% 334% 299% 87% 65% + 0% -- -21% -22% grtRegex 143/s 1661% 452% 408% 138% 109% + 27% 27% -- -0% grtSplit3 143/s 1663% 453% 408% 139% 110% + 28% 27% 0% --

        Not constraining the split to just the fields you need (given many fields as here, I'm guessing) is a significant performance hit but it seems that the three-argument split is level-pegging with the regular expression approach. The code.

        Sorry for the slow reply, I hope this is of interest.

        Cheers,

        JohnGG