### complex sort

by mkmcconn (Chaplain)
 on Dec 25, 2001 at 02:27 UTC Need Help??

mkmcconn has asked for the wisdom of the Perl Monks concerning the following question:

I was given the following problem at work. A DATA sample and my solution follows the READMORE tag.

Problem: Print a directory of engineering drawings in the order determined by drawing number data contained in the filename:

```
TYPE  Part  (Detail) Revision (ALT) layer
1D     1              R1       B1    .vec
18D    20     A       R0             .ras
```

Sort order:

1. layer,
2. reverse(ALT),
3. TYPE,
4. Part,
5. Revision,
6. Detail

ALT, if it exists, must be queued in reverse order (e.g. 3,2,1,'')
• How would you have done it?
• How does your solution scale better or work more reliably?
• How would you make the solution easier to understand and maintain?

```#!/usr/bin/perl -wl
use strict;

for (sort{

# TYPE: \$t0_\$t1_
# Part: \$p0_
# Det : \$pda
# Rev : \$r_
# ALT : \$alt_
# Layer \$l_

my (\$t0a,\$t1a,\$p0a,\$pda,\$ra,\$alta,\$la) =
map(m!(\d+)(\D+)(\d+)(\D*)R(\d+)B?(\d*)\.(\w+)\$!,\$a);
my (\$t0b,\$t1b,\$p0b,\$pdb,\$rb,\$altb,\$lb) =
map(m!(\d+)(\D+)(\d+)(\D*)R(\d+)B?(\d*)\.(\w+)\$!,\$b);

\$la  cmp \$lb      ||
(\$altb eq ''? 0 :\$altb)  <=> (\$alta eq '' ? 0 : \$alta)  ||
\$t0a <=> \$t0b     ||
\$t1a cmp \$t1b     ||
\$p0a <=> \$p0b     ||
\$ra  cmp \$rb      ||
\$pda cmp \$pdb;

}<DATA>){
chomp;
print;
}

__DATA__
18DD13AR0B2.vec
5A344R2B15.ras
5A344R2B15.vec
5A35SR0.ras
113A28R1.vec
113A28AR0.vec
113A29R0.ras
5A32R0.ras
113A29R1.vec
113A22R1.ras
5A35R0.vec
5A34R0.ras
113A22R1.vec
18DD12YR0B2.vec
18DD13AR0B2.ras
113A29AR0.ras
113A29AR2.vec
113A28R0.vec
5A33R0.vec
5A34R0.vec
5A35R0.ras
113A28AR2.vec
113B28R2.ras
113A28R2.vec
13B29AR0B1.ras
113B29AR0B1.vec
5A32R0.vec
5A33R0.ras

Explanation:

• separate alpha and numeric values to sort separately,
• compare in order of highest to lowest priority,
• compare lower priority values only if higher order comparison returns '0'; mkmcconn
• Replies are listed 'Best First'.
(Ovid) Re: complex sort
by Ovid (Cardinal) on Dec 25, 2001 at 03:58 UTC

Using the same regex twice in a row suggests to me that there are some efficiency gains here. I used a Schwartzian transform, reordered the data (not really necessary), and did benchmarks. Complete code is below. The ST appears to run about twice as fast.

Also, you have a string comparison where you needed a numberic comparison:

```          \$ra  cmp \$rb      || # this is number.  Shouldn't this be <=
+> ?

Also, at the end, I write out the arrays to a file so you can verify the sort order.

```#!/usr/bin/perl -wl
use strict;
use Benchmark;
use vars qw/ @data /;
use vars qw/ @results1 @results2 /;

@data = qw/
18DD13AR0B2.vec 5A344R2B15.ras  5A344R2B15.vec  5A35SR0.ras
113A28R1.vec    113A28AR0.vec   113A29R0.ras    5A32R0.ras
5A35R0.vec      5A34R0.ras      113A22R1.vec    18DD12YR0B2.vec
18DD13AR0B2.ras 113A29AR0.ras   113A29AR2.vec   113A28R0.vec
5A33R0.vec      5A34R0.vec      5A35R0.ras      113A28AR2.vec
113B28R2.ras    113A28R2.vec    13B29AR0B1.ras  113B29AR0B1.vec
5A32R0.vec      5A33R0.ras /;

timethese(5000, {
'mkmcconn' => '
@results1 = ();
for (sort{

my (\$t0a,\$t1a,\$p0a,\$pda,\$ra,\$alta,\$la) =
map(m!(\d+)(\D+)(\d+)(\D*)R(\d+)B?(\d*)\.(\w+)\$!,\$a);
my (\$t0b,\$t1b,\$p0b,\$pdb,\$rb,\$altb,\$lb) =
map(m!(\d+)(\D+)(\d+)(\D*)R(\d+)B?(\d*)\.(\w+)\$!,\$b);

\$la  cmp \$lb      ||
(\$altb eq ""? 0 :\$altb)  <=> (\$alta eq "" ? 0 : \$alta)  ||
\$t0a <=> \$t0b     ||
\$t1a cmp \$t1b     ||
\$p0a <=> \$p0b     ||
\$ra  cmp \$rb      ||
\$pda cmp \$pdb;

} @data ) {
push @main::results1, \$_;
}
',

'Ovid' => '
@main::results2 = ();
@main::results2 =
map  { \$_->[0] }
sort
{
\$a->[1][0] cmp \$b->[1][0]
||
\$b->[1][1] <=> \$a->[1][1]
||
\$a->[1][2] <=> \$b->[1][2]
||
\$a->[1][3] cmp \$b->[1][3]
||
\$a->[1][4] <=> \$b->[1][4]
||
\$a->[1][5] <=> \$b->[1][5]
||
\$a->[1][6] cmp \$b->[1][6]
}
map  { [\$_, main::get_data( \$_ )] } @main::data;
',
});

use Data::Dumper;
open T, "> test.txt" or die \$!;
print T Dumper \@results1, \@results2;
close T;

sub get_data
{
local \$_ = shift;
my @data = ( m!(\d+)(\D+)(\d+)(\D*)R(\d+)B?(\d*)\.(\w+)\$! );
\$data[3] ||= '';
\$data[4] ||= 0;
\$data[5] ||= 0;
# reorder the data for sorting
@data = @data[ 6, 5, 0, 1, 2, 4, 3 ];
return \@data;
}

Results:

```C:\>perl test.pl
Benchmark:
timing 5000 iterations of
mkmcconn, Ovid
...

mkmcconn: 69 wallclock secs (68.61 usr +  0.00 sys = 68.61 CPU) @
+ 72.88/s (n=5000)

Ovid: 32 wallclock secs (32.68 usr +  0.00 sys = 32.68 CPU) @ 15
+3.00/s (n=5000)

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: complex sort
by Juerd (Abbot) on Dec 25, 2001 at 03:06 UTC
You understand sort well, so there's not much efficiency to be gained, I think.
However, I would have used some complex data structure for readability. This also removes the need for temporary variables. (but adds an array)
```my @data =
sort {
\$a->[7] cmp \$b->[7] ||
\$b->[6] <=> \$a->[6] ||  # '' == 0
\$a->[0] <=> \$b->[0] ||
\$a->[1] <=> \$b->[1] ||
\$a->[2] <=> \$b->[2] ||
\$a->[4] cmp \$b->[4] ||
\$a->[3] cmp \$b->[3]
} map {
[ /^(\d+)(\D+)(\d+)(\D*)R(\d+)(B?)(\d*)\.(\w+)\$/ ]
} <DATA>;
print "\$_->[0]\$_->[1]\$_->[2]\$_->[3]R\$_->[4]\$_->[5]\$_->[6].\$_->[7]\n" f
+or @data;
I've made some mistake in there, but I leave it up to you to find and fix it. For some reason, my version puts 113B28R2.ras two places too high (top==0) in the array.

I think my version is more readable, and threrefore more reliable and maintainable (if the bug is fixed, that is ;))

```2;0 juerd@ouranos:~\$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~\$

a couple points,

The bug is that you sort the second type position numerically instead of lexicographically. However, i don't like the way you leave the elements in @data as array references when you're through sorting. You should have finished up the Schwartz Transform so that the original data was still there, just reorganized. If we make those corrections we come up with:

```my @data =
map { join '', @\$_[0..3],'R',@\$_[4..6],'.',\$_->[7], "\n" }
sort { \$a->[7] cmp \$b->[7] ||
\$b->[6] <=> \$a->[6] ||
\$a->[0] <=> \$b->[0] ||
\$a->[1] cmp \$b->[1] ||
\$a->[2] <=> \$b->[2] ||
\$a->[4] <=> \$b->[4] || # update: changed (see 1)
\$a->[3] cmp \$b->[3] }
map { [ /^(\d+)(\D+)(\d+)(\D*)R(\d+)(B?)(\d*)\.(\w+)\$/ ] }
<DATA>;
jynx

update: Unfortunately, before i even got to post this, Ovid posted a better solution that makes sure all the values are defined before being sorted as well.

1: this was changed due to Ovid's note...

Re: complex sort
by grinder (Bishop) on Dec 25, 2001 at 03:09 UTC
I haven't taken the time to actually download and run this code, but it looks pretty good to me. A few suggestions:

• You can chomp <DATA> straight away before passing it to the sort routine. The fact that you do so afterwards, and don't match whitespace before the \$ anchor makes me wonder whether the regexp really matches anything.
```sort {...} chomp <DATA>;

Juerd correctly points out that you can't chomp DATA. In actual fact, you don't even need to chomp at all. You're not actually doing anything to the records. Get rid of the chomp and the -l switch and be done with it.

• The sort function looks sufficiently complicated to merit using a Schwartz Transform to perform the split only once (split to a list and sort on the different elements).
The data you want to sort lend themselves beautifully to the Guttman Rosler Transform which is faster than the Schwartz Transform. (aside: the previous sentence shows clearly why the correct terminology is 'Schwartz Transform', not 'Schwartzian Transform'). Here is the code to do just that:
```print
map {
substr( \$_, 16 )
}
sort
map {
/^(\d+)       # digits of type
(\D+)       # type character
(\d+)       # part
(\D*)       # detail (optional)
R(\d+)      # revision count
B?(\d*)     # alternate count (optional)
\.(vec|ras) # file extension
\s+\$        # trailing whitespace
/x
? sprintf( '%3s%03d%03d%2s%03d%1d%1s', # numbers add up to
+ 16
\$7, 999 - (\$6 || 0), \$1, \$2, \$3, \$5, \$4, ) . \$_
: ('x' x 16) . \$_
}
<DATA>;

The idea is that you add a prefix to the data you want to sort, in order to be able employ a bare sort. Once the array hits the sort code, you are running a C speed until the sort is done. No more perl op-codes for this baby. At the other end of the sort, you throw away the prefix.

Note how I create the inverse of the alt count so that the normal compare still works. (The sprintf may have to be tailored to suit). Also note how I create a dummy prefix in case the regexp fails. For debugging, comment out the map that strips off the prefix.

• On the question of readability/maintainability, I would use the extended regexp syntax in order to comment what you're looking for.

• If you can use the // idiom to represent a regexp, then do so. Using m!! is unsettling.

• For a sort subroutine that big, name it. I.e.,
```sub part_sort {
...
}
sort part_sort <DATA>;
At least that way you can then set a breakpoint easily with b part_sort to see why the silly thing isn't working.

• You don't need the for block at all
```print sort part_sort <DATA>;
will do the job just as well.

• I don't presume to understand your job, but looking at the results, is the type subordinate to the layer or is it the other way around? I guess it's one of weirder naming schemes I have come across.

• As for your documentation, simply refer to http://www.perlmonks.org/index.pl?node_id=134235 in the comments. :)
--
##### g r i n d e r just another bofh print@_{sort keys %_},\$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r\$s-t%t#u';
You can chomp <DATA> straight away before passing it to the sort routine. [...]
[...]
sort part_sort chomp <DATA>;
<DATA> is immutable, so you cannot chomp it (chomp actually tries to modify what it gets - it returns the number of removed characters, not a list of chomped strings).

```#!/usr/bin/perl
print chomp <DATA>;

__DATA__
This piece of code will trigger the following compilation error:

Can't modify <HANDLE> in chomp at - line 2, near "<DATA>;"
Execution of test aborted due to compilation errors.

```2;0 juerd@ouranos:~\$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~\$

Re: complex sort
by mkmcconn (Chaplain) on Dec 25, 2001 at 06:16 UTC

Thank you all very much for your helpful suggestions. I combined them to form the following (which even (accidentally) managed to beat Ovid's by a hair):
mkmcconn: 4 wallclock secs ( 4.66 usr + 0.00 sys = 4.66 CPU) @ 1073.88/s (n=5000)
ovid: 7 wallclock secs ( 7.10 usr + 0.00 sys = 7.10 CPU) @ 704.13/s (n=5000)

```@result1 = map{join '',@\$_[0..4,6,8]}
sort{
\$a->[8] cmp \$b->[8]
||
(\$b->[7] eq '' ? 0 : \$b->[7]) <=> (\$a->[7] eq '' ? 0 : \$a->[7]
+) # didn't manage to eliminate this without warnings
||
\$a->[0] <=> \$b->[0]
||
\$a->[1] cmp \$b->[1]
||
\$a->[2] <=> \$b->[2]
||
\$a->[5] <=> \$b->[5]
||
\$a->[3] cmp \$b->[3]

} map {[m/(\d+)(\D+)(\d+)(\D*)(R(\d+))(B?(\d*))(\.\w+)\s*\$/] } <DATA
+>;

mkmcconn
corrected order

You simplify the second sort condition from using ?: to using ||:

```# this:
(\$b->[7] eq '' ? 0 : \$b->[7]) <=> (\$a->[7] eq '' ? 0 : \$a->[7]

# to this:
(\$b->[7]||0) <=> (\$a->[7]||0)
Re: complex sort
by George_Sherston (Vicar) on Dec 25, 2001 at 03:34 UTC
"Use a nested hash", I thought. It's one of those thoughts that sounds like it offers a neater solution than it does. This does the job, but it's a little contrived. Having said that, once you've got the stuff into the data structure you may find there are lots of other great things you can do with it, so there may be some merit here that I haven't spotted. See how you like:
```my %sort;

for (<DATA>) {
m!(\d+)(\D+)(\d+)(\D*)R(\d+)B?(\d*)\.(\w+)\$!;
\$sort{\$7}{\$6}{"\$1\$2"}{\$3}{\$5}{\$4} = \$_;
}

for my \$first (sort keys %sort) {
for my \$second (reverse keys %{\$sort{\$first}}) {
for my \$third (sort keys %{\$sort{\$first}{\$second}}) {
for my \$fourth (sort keys %{\$sort{\$first}{\$second}{\$third}
+}) {
for my \$fifth (sort keys %{\$sort{\$first}{\$second}{\$thi
+rd}{\$fourth}}) {
for my \$sixth (sort keys %{\$sort{\$first}{\$second}{
+\$third}{\$fourth}{\$fifth}}) {
print \$sort{\$first}{\$second}{\$third}{\$fourth}{
+\$fifth}{\$sixth};
}
}
}
}
}
}
This prints out
```18AD13XR0B3.ras
18DD13AR0B2.ras
5A344R2B15.ras
13B29AR0B1.ras
113A22R1.ras
113A29R0.ras
113A29AR0.ras
113B28R2.ras
5A32R0.ras
5A33R0.ras
5A34R0.ras
5A35R0.ras
5A35SR0.ras
18DD12YR0B2.vec
18DD13AR0B2.vec
5A344R2B15.vec
113B29AR0B1.vec
113A22R1.vec
113A28R0.vec
113A28AR0.vec
113A28R1.vec
113A28R2.vec
113A28AR2.vec
113A29R1.vec
113A29AR2.vec
5A32R0.vec
5A33R0.vec
5A34R0.vec
5A35R0.vec
... if that's not exactly what you wanted, you can probably tweak the code. And maybe some greater monk than I can make the recursive unpacking of the hash a bit shorter.

Merry Christmas!

§ George Sherston

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://134235]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2023-03-28 21:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Which type of climate do you prefer to live in?

Results (69 votes). Check out past polls.

Notices?