Sorting question

perl_seeker has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks!
This is what I need to do. I have a text file that contains a column of entries like this:


test.txt

K-2-D-10A
K-2-D-10C
K-2-D-10D
K-2-D-10E
K-2-D-10F
K-2-D-10G
K-2-D-11A
K-2-D-11B

K-2-D-12I
K-2-D-12J

K-2-D-12A
K-2-D-12B
K-2-D-12C
K-2-D-12D
K-2-D-12E
K-2-D-12F
K-2-D-12G
K-2-D-12H
K-2-D-13B
K-2-D-13C

K-2-D-14E
K-2-D-14I

K-2-D-14A
K-2-D-14B
K-2-D-14C
K-2-D-14D
K-2-D-14F
K-2-D-14G
K-2-D-14H
K-2-D-14J
K-2-D-14K

K-2-D-15E

K-2-D-15A
K-2-D-15B
K-2-D-15D
K-2-D-16A
K-2-D-16B
K-2-D-16D
K-2-D-16E
K-2-D-17A
K-2-D-17B
K-2-D-17C
K-2-D-17D
K-2-D-17E
K-2-D-17F
K-2-D-17G
K-2-D-17K

K-2-D-18C

K-2-D-18A
K-2-D-19A
K-2-D-19B

K-2-D-1A
K-2-D-1B
K-2-D-1D

K-2-D-2J
K-2-D-2I

K-2-D-20A
K-2-D-20B

K-2-D-21I
K-2-D-21Q
K-2-D-21H
K-2-D-21O
K-2-D-21P

K-2-D-21A
K-2-D-21B
K-2-D-21C
K-2-D-21E
K-2-D-21F
K-2-D-21G
K-2-D-21J
K-2-D-21K
K-2-D-21L
K-2-D-21M
K-2-D-21N
K-2-D-21R
K-2-D-22A
K-2-D-22B
K-2-D-22C
K-2-D-22D
K-2-D-22E
K-2-D-22F
K-2-D-22G
K-2-D-22H

K-2-D-2F
K-2-D-2G
K-2-D-2H
K-2-D-3A
K-2-D-3B
K-2-D-3C
K-2-D-3D
K-2-D-3E
K-2-D-3F
K-2-D-4A
K-2-D-4B
K-2-D-5F
K-2-D-5A
K-2-D-5B
K-2-D-5C
K-2-D-5E
K-2-D-6D
K-2-D-6A
K-2-D-6E
K-2-D-7A
K-2-D-7B
K-2-D-7C
K-2-D-8A
K-2-D-8B
K-2-D-8C
K-2-D-8D
K-2-D-8E
K-2-D-9A
K-2-D-9B
K-2-D-9C
K-2-D-9D
K-2-D-9E

K-2-D-10B
K-2-D-16C
K-2-D-1C
K-2-D-5D
K-2-D-6B
K-2-D-6C
K-2-D-7D
K-2-D-7E
[download]

This is what my output file needs to look like after sorting:

output.txt

K-2-D-1A
K-2-D-1B

K-2-D-1C

K-2-D-1D

K-2-D-2F
K-2-D-2G
K-2-D-2H

K-2-D-2I
K-2-D-2J

K-2-D-3A
K-2-D-3B
K-2-D-3C
K-2-D-3D
K-2-D-3E
K-2-D-3F

K-2-D-4A
K-2-D-4B
K-2-D-5A
K-2-D-5B
K-2-D-5C

K-2-D-5D

K-2-D-5E
K-2-D-5F
K-2-D-6A

K-2-D-6B
K-2-D-6C

K-2-D-6D
K-2-D-6E
K-2-D-7A
K-2-D-7B
K-2-D-7C

K-2-D-7D
K-2-D-7E

K-2-D-8A
K-2-D-8B
K-2-D-8C
K-2-D-8D
K-2-D-8E
K-2-D-9A
K-2-D-9B
K-2-D-9C
K-2-D-9D
K-2-D-9E

K-2-D-10A


K-2-D-10B

K-2-D-10C
K-2-D-10D
K-2-D-10E
K-2-D-10F
K-2-D-10G
K-2-D-11A
K-2-D-11B
K-2-D-12A
K-2-D-12B
K-2-D-12C
K-2-D-12D
K-2-D-12E
K-2-D-12F
K-2-D-12G
K-2-D-12H

K-2-D-12I
K-2-D-12J

K-2-D-13B
K-2-D-13C
K-2-D-14A
K-2-D-14B
K-2-D-14C
K-2-D-14D

K-2-D-14E

K-2-D-14F
K-2-D-14G
K-2-D-14H

K-2-D-14I

K-2-D-14J
K-2-D-14K


K-2-D-15A
K-2-D-15B
K-2-D-15D

K-2-D-15E

K-2-D-16A
K-2-D-16B

K-2-D16-C

K-2-D-16D
K-2-D-16E
K-2-D-17A
K-2-D-17B
K-2-D-17C
K-2-D-17D
K-2-D-17E
K-2-D-17F
K-2-D-17G
K-2-D-17K
K-2-D-18A

K-2-D-18C

K-2-D-19A
K-2-D-19B

K-2-D-20A
K-2-D-20B
K-2-D-21A
K-2-D-21B
K-2-D-21C
K-2-D-21E
K-2-D-21F
K-2-D-21G

K-2-D-21H
K-2-D-21I

K-2-D-21J
K-2-D-21K
K-2-D-21L
K-2-D-21M
K-2-D-21N

K-2-D-21O
K-2-D-21P
K-2-D-21Q

K-2-D-21R
K-2-D-22A
K-2-D-22B
K-2-D-22C
K-2-D-22D
K-2-D-22E
K-2-D-22F
K-2-D-22G
K-2-D-22H
[download]

I've put spaces before and after the entries that need to be sorted/are sorted.
Any help would be appreciated.Thanks,

perl_seeker:)

Readmore tags added by GrandFather

Comment on Sorting question Select or Download Code

Replies are listed 'Best First'.
Re: Sorting question by McDarren (Abbot) on May 14, 2006 at 16:25 UTC
Sort::Naturally `use Sort::Naturally; my @sorted = nsort(<DATA>); __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G ...etc` [download]	[reply] [d/l]
Re^2: Sorting question by salva (Canon) on May 14, 2006 at 19:03 UTC
or Sort::Key::Natural: `use Sort::Key::Natural qw(natsort); my @sorted = natsort @data;` [download] that for the OP sample data is almost five times faster than Sort::Naturally: `use Benchmark qw(cmpthese); use Sort::Naturally qw(nsort); use Sort::Key::Natural qw(natsort); my @data = grep !/^\s*$/, <DATA>; chomp(@data); cmpthese(-10, { sn => sub { my @s = nsort @data }, skn => sub { my @s = natsort @data } } ); __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D ...` [download] outputs... `$ perl bm.pl Rate sn skn sn 45.0/s -- -79% skn 216/s 381% --` [download]	[reply] [d/l] [select]
Re^3: Sorting question by jwkrahn (Abbot) on May 15, 2006 at 22:32 UTC
FYI, since we're running benckmarks: #!/usr/bin/perl use warnings; use strict; use Benchmark 'cmpthese'; use Sort::Naturally 'nsort'; use Sort::Key::Natural 'natsort'; my @data = grep /\S/, <DATA>; sub normalize_digits { my ( $key ) = @_; $key =~ s/(\d+)/sprintf '%03d', $1/eg; return $key; } cmpthese -10, { SN_nsort => sub { my @s = nsort @data }, SKN_natsort => sub { my @s = natsort @data }, GRT_pack => sub { my @s = map unpack( 'x3 a', $_ ), sort map pack( 'n a a', /(\d+)([A-Z])/, $_ ), @data }, ST_sub => sub { my @s = map { $_->[ 0 ] } sort { $a->[ 1 ] cmp $b->[ 1 ] } map { [ $_, normalize_digits( $_ ) ] } @data }, GRT_sub => sub { my @s = map { local $_ = $_; s/^.*\0//; $_ } sort map { normalize_digits( $_ ) . "\0$_" } @data }, GRT_sprintf => sub { my @s = map { s/-0+/-/g; $_ } sort map { s/(\d+)/sprintf '%03d', $1/eg; $_ } @data }, }; __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G K-2-D-11A etc. etc. [download] Which gave me these results: `Rate SN_nsort SKN_natsort ST_sub GRT_sub GRT_sprint +f GRT_pack SN_nsort 64.6/s -- -81% -90% -91% -93 +% -97% SKN_natsort 341/s 427% -- -47% -51% -62 +% -82% ST_sub 642/s 893% 88% -- -7% -29 +% -67% GRT_sub 694/s 973% 104% 8% -- -23 +% -64% GRT_sprintf 900/s 1292% 164% 40% 30% - +- -53% GRT_pack 1928/s 2883% 466% 200% 178% 114 +% --` [download] :-)	[reply] [d/l] [select]
Re^4: Sorting question by salva (Canon) on May 16, 2006 at 17:20 UTC
Re^2: Sorting question by perl_seeker (Scribe) on May 15, 2006 at 15:02 UTC
Hi, thanks for the help	[reply]
Re^2: Sorting question by perl_seeker (Scribe) on May 15, 2006 at 15:07 UTC
Hello McDarren, thanks	[reply]
Re: Sorting question by dws (Chancellor) on May 14, 2006 at 16:27 UTC
What you may need here is a "Schwartzian Transform", so that you can sort based on a transformed key, where the transform is to expand any group of digits into a wider, fixed-sized group. In this way, "K-2-D-1A" gets a sort key of "K-002-D-001A", and "K-2-D-10A" gets a sort key of "K-002-D-010A". Since all sequences of numbers are now equally wide, you'll get the sort order you desire. But since the Schwartzian Transform holds on to unmodified data, you'll have the untransformed keys to display. Here's an (untested) starting point: `my @data = qw(K-2-D-1A K-2-D-2A K-2-D-10A); my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, normalize_digits($_) ] } @data; sub normalize_digits { my ($key) = @_; $key =~ s/(\d+)/sprintf("%03d", $1)/eg; return $key; # thanks, ikegami }` [download] This builds a composite data structure with the original key and a "normalized" key, sorts the structure based on the normalized key, and then discards the normalized key. Update: Or, better yet, don't reinvent the wheel, and use Sort::Naturally (which I'd somehow overlooked).	[reply] [d/l]
Re^2: Sorting question by ikegami (Patriarch) on May 14, 2006 at 17:09 UTC
Your code doesn't work because you forgot `return $key;` in `normalize_digits`. Also, I think you can speed things up by using the following: `my @sorted = map { local $_ = $_; s/^.*\0//; $_ } sort map { normalize_digits($_) . "\0$_" } @data;` [download]	[reply] [d/l] [select]
Re^2: Sorting question (slow) by tye (Sage) on May 14, 2006 at 16:58 UTC
Note that your quite simple, dozen-line reinvention is going to be a ton faster than the method used by Sort::Naturally (you can probably make it 2- to 4-times faster still and more memory efficient by using a smarter sorting technique as well, such as my favorite or some found by natural sort). The file size might not make even these quite small increases in complexity in the code worthwhile, of course. - tye	[reply]
Re^2: Sorting question by Scott7477 (Chaplain) on May 15, 2006 at 22:38 UTC
Your explanation of the "Schwartzian Transform" is the easiest one to understand that I've seen so far. Nice++	[reply]
Re: Sorting question by GrandFather (Saint) on May 14, 2006 at 19:50 UTC
You havethe answers that you need. But for future reference for posting questions here, here are a few hints. Keep your data short. If you can demonstrate what you need with 5 lines of data then any more is obscuring your point and people will stop reading your question. Generally we like to see some code to see that you have had a try at solving the problem. If you absolutly must include a huge amount of data then put it in readmore tags (see Writeup Formatting Tips). Although your code and data generally show a problem, it is always good to describe your problem in words so that we understand where you are coming from. In this case you could have posted something like: I have a problem sorting some data that is mixed numbers and letters. At the moment I am using `sort` but the result is like this: `K-2-D-19A K-2-D-1D K-2-D-20A K-2-D-20B K-2-D-2I K-2-D-2J` [download] and I would like it to be like: `K-2-D-1D K-2-D-19A K-2-D-2I K-2-D-2J K-2-D-20A K-2-D-20B` [download] How do I do that? DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Sorting question by jwkrahn (Abbot) on May 14, 2006 at 20:21 UTC
Based upon your data you can use a GRT something like: `#!/usr/bin/perl use warnings; use strict; print map unpack( 'x3 a', $_ ), sort map pack( 'n a a', /(\d+)([A-Z])/, $_ ), grep /\S/, <DATA>; __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G K-2-D-11A K-2-D-11B K-2-D-12I K-2-D-12J etc. etc.` [download]	[reply] [d/l]
Re^2: Sorting question by perl_seeker (Scribe) on May 15, 2006 at 15:37 UTC
Hello dws, tye, ikegami, jwkrahn thanks a lot for all your input and comments :)	[reply]