best practices for a complex sort + splitting an alphanumeric string

gabrielle has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a sort block to handle data of different formats.

My data (switch portnames) takes one of four formats:

my %oids = (
    "ifName" => {
         1 => "Gi1/0/1",
         2 => "Gi1/0/0",
         3 => "Gi1/1/1",
         4 => "Gi2/0/0",
         5 => "Gi2/0/1",
    },
);
[download]

- or:
my %oids = (
    "ifName" => {
         1 => "1",
         2 => "2",
         3 => "3",
         4 => "4",
         5 => "5",
    },
);
- or:
my %oids = (
    "ifName" => {
         1 => "2/1",
         2 => "4/5",
         3 => "15/1",
         4 => "1/1",
         5 => "1/2",
    },
);
- or:
my %oids = (
    "ifName" => {
         1 => "Fa0/1",
         2 => "Fa0/2",
         3 => "Fa3/1",
         4 => "Gi0/1",
         5 => "Gi0/0",
    },
);
[download]

The goal is to get an array of the hash keys in value-sorted order. (I'm using a HoH because I have lots of other data referenced by keys of %oids.)

Here's what I've come up with:

my (@a, @b);
my @port_ifIndices = sort {
    if ($oids{ifName}{$a} =~ /\//) {
        @a = split(/\//, $oids{ifName}{$a});
        @b = split(/\//, $oids{ifName}{$b});
    }
    else {
        @a = $oids{ifName}{$a};
        @b = $oids{ifName}{$b};
    }
    if ($a[0] =~ /[A-Za-z]/) {
        $a[0] ne $b[0] ? $a[0] cmp $b[0] : ($a[1] <=> $b[1] ? $a[1] <=
+> $b[1] : $a[2] <=> $b[2]);
    } else {
        $a[0] != $b[0] ? $a[0] <=> $b[0] : ($a[1] <=> $b[1] ? $a[1] <=
+> $b[1] : $a[2] <=> $b[2]);
    }
} keys(%{ $oids{ifName} });

foreach my $ifIndex (@port_ifIndices) {
        print ("$ifIndex: $oids{ifName}{$ifIndex}\n");
}
[download]

This gives me (using my first data example):

2: Gi1/0/0
1: Gi1/0/1
3: Gi1/1/1
4: Gi2/0/0
5: Gi2/0/1
[download]

...which is exactly what I want.

I'm aware that the recomputation of my sort keys with each iteration is not recommended for speed. I'm following the "build it up by pieces" method for a Schwartzian Transform (my first, woo-hoo) from _Effective Perl Programming_, and I need to start with a working sort. :)

The questions:

1. Is there a more efficient way to handle my initial computation of the @a and @b arrays?

2. Am I violating any rules/best practices by having the second if/else block (the one that actually chooses the sort operation) within my sort block? If so, what are my other options?

3. The code above gives me my desired result with my existing data set. In case that changes in the future, I'm trying to figure out how to split "Fa10/0/1" into [Fa, 10, 0, 1] but all my ideas involve substrings & splits & freaky array manipulations and quickly get very ugly. Ideas?

Thank you!

Comment on best practices for a complex sort + splitting an alphanumeric string Select or Download Code

Replies are listed 'Best First'.
Re: best practices for a complex sort + splitting an alphanumeric string by revdiablo (Prior) on Jan 12, 2006 at 19:53 UTC
A few quick comments. This code: `if ($oids{ifName}{$a} =~ /\//) { @a = split(/\//, $oids{ifName}{$a}); @b = split(/\//, $oids{ifName}{$b}); } else { @a = $oids{ifName}{$a}; @b = $oids{ifName}{$b}; }` [download] Can be replaced with simply: `@a = split m{/}, $oids{ifName}{$a}; @b = split m{/}, $oids{ifName}{$b};` [download] And this code: `if ($a[0] =~ /[A-Za-z]/) { $a[0] ne $b[0] ? $a[0] cmp $b[0] : ($a[1] <=> $b[1] ? $a[1] <= +> +$b[1] : $a[2] <=> $b[2]); } else { $a[0] != $b[0] ? $a[0] <=> $b[0] : ($a[1] <=> $b[1] ? $a[1] <= +> +$b[1] : $a[2] <=> $b[2]); }` [download] Can be replaced with: `if ($a[0] =~ /[A-Za-z]/) { $a[0] cmp $b[0] \|\| $a[1] <=> $b[1]; } else { $a[0] <=> $b[0] \|\| $a[1] <=> $b[1]; }` [download] I'm sure there are other refinements, but these ones stood out to me immediately. Update: perhaps the 2nd can even be simplified to: `$a[0] <=> $b[0] \|\| $a[0] cmp $b[0] \|\| $a[1] <=> $b[1];` [download] I think so, but I haven't tested to make sure. Another update: I noticed I left out the 3rd level comparison, but hopefully it should be obvious how to add it in. :-)	[reply] [d/l] [select]
Re^2: best practices for a complex sort + splitting an alphanumeric string by gabrielle (Novice) on Jan 12, 2006 at 22:42 UTC
Oooh, thanks! I like the reduction of the array assignments. (Reducing my code to elegant/minimal pieces is proving a challenge.) (Not annoying at all, ty!)	[reply]
Re: best practices for a complex sort + splitting an alphanumeric string by jdporter (Paladin) on Jan 12, 2006 at 20:12 UTC
Well, #3 is pretty easy to answer. `# given the string value in $_ ... my @v = split m#/#; unshift @v, shift(@v) =~ /^(.?)(\d)$/;` [download] For values that don't have any letters at the beginning, the first element of @v will be an empty string. From there, you can easily create a sort key, using (for example) `sprintf "%-4s%3d%3d%3d", @v;` [download] Putting it all together: `map { $_->[1] } sort { $_->[0] cmp $_->[1] } map { my @v = split m#/#; unshift @v, shift(@v) =~ /^(.?)(\d)$/; no warnings; # in case @v has less than 4 elements [ sprintf( "%-4s%3d%3d%3d", @v ), $_ ] }` [download] It's probably worth pointing that this could be further optimized by using the GRT, or possibly fast, flexible, stable sort. See also this tutorial on sorting. We're building the house of the future together.	[reply] [d/l] [select]
Re: best practices for a complex sort + splitting an alphanumeric string by revdiablo (Prior) on Jan 12, 2006 at 20:01 UTC
This is separate enough from my previous reply that I thought I'd reply again. Hopefully I don't get too annoying. ;-) The code above gives me my desired result with my existing data set. In case that changes in the future, I'm trying to figure out how to split "Fa10/0/1" into `[Fa, 10, 0, 1]` but all my ideas involve substrings & splits & freaky array manipulations and quickly get very ugly. Ideas? `@a = $a =~ /( [[:alpha:]]+ \| [[:digit:]]+ )/gx; @b = $b =~ /( [[:alpha:]]+ \| [[:digit:]]+ )/gx;` [download]	[reply] [d/l] [select]
Re: best practices for a complex sort + splitting an alphanumeric string by demerphq (Chancellor) on Jan 12, 2006 at 22:36 UTC
Sometimes its worth it to just turn off uninitialized warnings: Read more... (946 Bytes) foreach my $hash (@hash) { no warnings 'uninitialized'; print "$_ => $hash->{$_}\n" for map { $_->[0] } sort { $a->[1] cmp $b->[1] \|\| $a->[2] <=> $b->[2] \|\| $a->[3] <=> $b->[3] \|\| $a->[4] <=> $b->[4] } map { [ $_, $hash->{$_}=~m!^(?:([a-z]+)(\d+)/)?(?:(\d+)/)?(\d+)$!i + ] } keys %$hash; print "-","\n"; } __END__ 2 => Gi1/0/0 1 => Gi1/0/1 3 => Gi1/1/1 4 => Gi2/0/0 5 => Gi2/0/1 - 1 => 1 2 => 2 3 => 3 4 => 4 5 => 5 - 4 => 1/1 5 => 1/2 1 => 2/1 2 => 4/5 3 => 15/1 - 1 => Fa0/1 2 => Fa0/2 3 => Fa3/1 5 => Gi0/0 4 => Gi0/1 - [download] --- $world=~s/war/peace/g	[reply] [d/l] [select]
Re: best practices for a complex sort + splitting an alphanumeric string (natural) by tye (Sage) on Jan 13, 2006 at 07:30 UTC
I'd use something similar to one of the techniques found by searching for natural sort. Likely: `my @sortedIdx= map { unpack "N", substr($_,-4) } sort map { my $key = $oids{ifName}{$_}; $key =~ s[(\d+)][ '0' . pack "N", $1 ]ge; $key . pack "CN", 0, $_ } keys %{$oids{ifName}};` [download] Though I'm a little curious why your integer indices are used to index a hash not an array. If your keys aren't just integers, then you'll need: `my @keys= ( keys %{$oids{ifName}} )[ map { unpack "N", substr($_,-4) } sort map { my $key = $oids{ifName}{$_}; $key =~ s[(\d+)][ '0' . pack "N", $1 ]ge; $key . pack "CN", 0, $_-1 } 1 .. keys %{$oids{ifName}} ];` [download] (Updated) - tye	[reply] [d/l] [select]
Re^2: best practices for a complex sort + splitting an alphanumeric string (natural) by gabrielle (Novice) on Jan 13, 2006 at 20:42 UTC
...searching for natural sort Thanks! I had not heard that term before. That will be very useful. Though I'm a little curious why your integer indices are used to index a hash not an array. That's a really good question & embarassingly, I have no good technical answer. I suspect it is both because this is recycled code that I'm reworking to handle a larger data set, and the original keys were not integers; and because I work with hashes so much more than arrays that it's just a more comfortable method for me. Thanks for pointing that out - I will re-think how I'm handling this data structure. Update:After dinking around with this a bit: The integer index is an important key into other data structures (it's an ifIndex, if you are familiar with SNMP MIB-II). This was not obvious from my small example, sorry. The indices are not always sequential - often a few (or several, gah!) are missing here and there. For me, it's clearer (and therefore easier to maintain) to think of the integer as a hash key. Trying to work with an array for that was a very interesting exercise, thanks. :)	[reply]