perl_seeker has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks!
This is what I need to do. I have a text file that contains a column of entries like this:
test.txt K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G K-2-D-11A K-2-D-11B K-2-D-12I K-2-D-12J K-2-D-12A K-2-D-12B K-2-D-12C K-2-D-12D K-2-D-12E K-2-D-12F K-2-D-12G K-2-D-12H K-2-D-13B K-2-D-13C K-2-D-14E K-2-D-14I K-2-D-14A K-2-D-14B K-2-D-14C K-2-D-14D K-2-D-14F K-2-D-14G K-2-D-14H K-2-D-14J K-2-D-14K K-2-D-15E K-2-D-15A K-2-D-15B K-2-D-15D K-2-D-16A K-2-D-16B K-2-D-16D K-2-D-16E K-2-D-17A K-2-D-17B K-2-D-17C K-2-D-17D K-2-D-17E K-2-D-17F K-2-D-17G K-2-D-17K K-2-D-18C K-2-D-18A K-2-D-19A K-2-D-19B K-2-D-1A K-2-D-1B K-2-D-1D K-2-D-2J K-2-D-2I K-2-D-20A K-2-D-20B K-2-D-21I K-2-D-21Q K-2-D-21H K-2-D-21O K-2-D-21P K-2-D-21A K-2-D-21B K-2-D-21C K-2-D-21E K-2-D-21F K-2-D-21G K-2-D-21J K-2-D-21K K-2-D-21L K-2-D-21M K-2-D-21N K-2-D-21R K-2-D-22A K-2-D-22B K-2-D-22C K-2-D-22D K-2-D-22E K-2-D-22F K-2-D-22G K-2-D-22H K-2-D-2F K-2-D-2G K-2-D-2H K-2-D-3A K-2-D-3B K-2-D-3C K-2-D-3D K-2-D-3E K-2-D-3F K-2-D-4A K-2-D-4B K-2-D-5F K-2-D-5A K-2-D-5B K-2-D-5C K-2-D-5E K-2-D-6D K-2-D-6A K-2-D-6E K-2-D-7A K-2-D-7B K-2-D-7C K-2-D-8A K-2-D-8B K-2-D-8C K-2-D-8D K-2-D-8E K-2-D-9A K-2-D-9B K-2-D-9C K-2-D-9D K-2-D-9E K-2-D-10B K-2-D-16C K-2-D-1C K-2-D-5D K-2-D-6B K-2-D-6C K-2-D-7D K-2-D-7E
This is what my output file needs to look like after sorting:
output.txt K-2-D-1A K-2-D-1B K-2-D-1C K-2-D-1D K-2-D-2F K-2-D-2G K-2-D-2H K-2-D-2I K-2-D-2J K-2-D-3A K-2-D-3B K-2-D-3C K-2-D-3D K-2-D-3E K-2-D-3F K-2-D-4A K-2-D-4B K-2-D-5A K-2-D-5B K-2-D-5C K-2-D-5D K-2-D-5E K-2-D-5F K-2-D-6A K-2-D-6B K-2-D-6C K-2-D-6D K-2-D-6E K-2-D-7A K-2-D-7B K-2-D-7C K-2-D-7D K-2-D-7E K-2-D-8A K-2-D-8B K-2-D-8C K-2-D-8D K-2-D-8E K-2-D-9A K-2-D-9B K-2-D-9C K-2-D-9D K-2-D-9E K-2-D-10A K-2-D-10B K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G K-2-D-11A K-2-D-11B K-2-D-12A K-2-D-12B K-2-D-12C K-2-D-12D K-2-D-12E K-2-D-12F K-2-D-12G K-2-D-12H K-2-D-12I K-2-D-12J K-2-D-13B K-2-D-13C K-2-D-14A K-2-D-14B K-2-D-14C K-2-D-14D K-2-D-14E K-2-D-14F K-2-D-14G K-2-D-14H K-2-D-14I K-2-D-14J K-2-D-14K K-2-D-15A K-2-D-15B K-2-D-15D K-2-D-15E K-2-D-16A K-2-D-16B K-2-D16-C K-2-D-16D K-2-D-16E K-2-D-17A K-2-D-17B K-2-D-17C K-2-D-17D K-2-D-17E K-2-D-17F K-2-D-17G K-2-D-17K K-2-D-18A K-2-D-18C K-2-D-19A K-2-D-19B K-2-D-20A K-2-D-20B K-2-D-21A K-2-D-21B K-2-D-21C K-2-D-21E K-2-D-21F K-2-D-21G K-2-D-21H K-2-D-21I K-2-D-21J K-2-D-21K K-2-D-21L K-2-D-21M K-2-D-21N K-2-D-21O K-2-D-21P K-2-D-21Q K-2-D-21R K-2-D-22A K-2-D-22B K-2-D-22C K-2-D-22D K-2-D-22E K-2-D-22F K-2-D-22G K-2-D-22H
I've put spaces before and after the entries that need to be sorted/are sorted.
Any help would be appreciated.Thanks,

perl_seeker:)

Readmore tags added by GrandFather

Replies are listed 'Best First'.
Re: Sorting question
by McDarren (Abbot) on May 14, 2006 at 16:25 UTC
    Sort::Naturally
    use Sort::Naturally; my @sorted = nsort(<DATA>); __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G ...etc
      or Sort::Key::Natural:
      use Sort::Key::Natural qw(natsort); my @sorted = natsort @data;
      that for the OP sample data is almost five times faster than Sort::Naturally:
      use Benchmark qw(cmpthese); use Sort::Naturally qw(nsort); use Sort::Key::Natural qw(natsort); my @data = grep !/^\s*$/, <DATA>; chomp(@data); cmpthese(-10, { sn => sub { my @s = nsort @data }, skn => sub { my @s = natsort @data } } ); __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D ...
      outputs...
      $ perl bm.pl Rate sn skn sn 45.0/s -- -79% skn 216/s 381% --
        FYI, since we're running benckmarks:
        #!/usr/bin/perl use warnings; use strict; use Benchmark 'cmpthese'; use Sort::Naturally 'nsort'; use Sort::Key::Natural 'natsort'; my @data = grep /\S/, <DATA>; sub normalize_digits { my ( $key ) = @_; $key =~ s/(\d+)/sprintf '%03d', $1/eg; return $key; } cmpthese -10, { SN_nsort => sub { my @s = nsort @data }, SKN_natsort => sub { my @s = natsort @data }, GRT_pack => sub { my @s = map unpack( 'x3 a*', $_ ), sort map pack( 'n a a*', /(\d+)([A-Z])/, $_ ), @data }, ST_sub => sub { my @s = map { $_->[ 0 ] } sort { $a->[ 1 ] cmp $b->[ 1 ] } map { [ $_, normalize_digits( $_ ) ] } @data }, GRT_sub => sub { my @s = map { local $_ = $_; s/^.*\0//; $_ } sort map { normalize_digits( $_ ) . "\0$_" } @data }, GRT_sprintf => sub { my @s = map { s/-0+/-/g; $_ } sort map { s/(\d+)/sprintf '%03d', $1/eg; $_ } @data }, }; __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G K-2-D-11A etc. etc.

        Which gave me these results:

        Rate SN_nsort SKN_natsort ST_sub GRT_sub GRT_sprint +f GRT_pack SN_nsort 64.6/s -- -81% -90% -91% -93 +% -97% SKN_natsort 341/s 427% -- -47% -51% -62 +% -82% ST_sub 642/s 893% 88% -- -7% -29 +% -67% GRT_sub 694/s 973% 104% 8% -- -23 +% -64% GRT_sprintf 900/s 1292% 164% 40% 30% - +- -53% GRT_pack 1928/s 2883% 466% 200% 178% 114 +% --

        :-)

      Hi, thanks for the help
      Hello McDarren, thanks
Re: Sorting question
by dws (Chancellor) on May 14, 2006 at 16:27 UTC

    What you may need here is a "Schwartzian Transform", so that you can sort based on a transformed key, where the transform is to expand any group of digits into a wider, fixed-sized group. In this way, "K-2-D-1A" gets a sort key of "K-002-D-001A", and "K-2-D-10A" gets a sort key of "K-002-D-010A". Since all sequences of numbers are now equally wide, you'll get the sort order you desire. But since the Schwartzian Transform holds on to unmodified data, you'll have the untransformed keys to display.

    Here's an (untested) starting point:

    my @data = qw(K-2-D-1A K-2-D-2A K-2-D-10A); my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, normalize_digits($_) ] } @data; sub normalize_digits { my ($key) = @_; $key =~ s/(\d+)/sprintf("%03d", $1)/eg; return $key; # thanks, ikegami }
    This builds a composite data structure with the original key and a "normalized" key, sorts the structure based on the normalized key, and then discards the normalized key.

    Update: Or, better yet, don't reinvent the wheel, and use Sort::Naturally (which I'd somehow overlooked).

      Your code doesn't work because you forgot return $key; in normalize_digits.

      Also, I think you can speed things up by using the following:

      my @sorted = map { local $_ = $_; s/^.*\0//; $_ } sort map { normalize_digits($_) . "\0$_" } @data;

      Note that your quite simple, dozen-line reinvention is going to be a ton faster than the method used by Sort::Naturally (you can probably make it 2- to 4-times faster still and more memory efficient by using a smarter sorting technique as well, such as my favorite or some found by natural sort). The file size might not make even these quite small increases in complexity in the code worthwhile, of course.

      - tye        

      Your explanation of the "Schwartzian Transform" is the easiest one to understand that I've seen so far. Nice++
Re: Sorting question
by GrandFather (Saint) on May 14, 2006 at 19:50 UTC

    You havethe answers that you need. But for future reference for posting questions here, here are a few hints.

    Keep your data short. If you can demonstrate what you need with 5 lines of data then any more is obscuring your point and people will stop reading your question.

    Generally we like to see some code to see that you have had a try at solving the problem.

    If you absolutly must include a huge amount of data then put it in readmore tags (see Writeup Formatting Tips).

    Although your code and data generally show a problem, it is always good to describe your problem in words so that we understand where you are coming from. In this case you could have posted something like:

    I have a problem sorting some data that is mixed numbers and letters. At the moment I am using sort but the result is like this:

    K-2-D-19A K-2-D-1D K-2-D-20A K-2-D-20B K-2-D-2I K-2-D-2J

    and I would like it to be like:

    K-2-D-1D K-2-D-19A K-2-D-2I K-2-D-2J K-2-D-20A K-2-D-20B

    How do I do that?


    DWIM is Perl's answer to Gödel
Re: Sorting question
by jwkrahn (Abbot) on May 14, 2006 at 20:21 UTC
    Based upon your data you can use a GRT something like:
    #!/usr/bin/perl use warnings; use strict; print map unpack( 'x3 a*', $_ ), sort map pack( 'n a a*', /(\d+)([A-Z])/, $_ ), grep /\S/, <DATA>; __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G K-2-D-11A K-2-D-11B K-2-D-12I K-2-D-12J etc. etc.

      Hello dws, tye, ikegami, jwkrahn thanks a lot for all your input and comments :)