How-to sort nested hash table?

Scottie has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Is it possible to sort by file number ($file_number) the below hash table (%channel_db_files) without using any additional, supplementary arrays?

%channel_db_files = (
          'ch1' => {
                     '00010' => '/foo/oradata/bar/foodb-lob01.dbf',
                     '00004' => '/foo/oradata/bar/foodb-data02.dbf',
                     '00007' => '/foo/oradata/bar/undotbs02.dbf'
                   },
          'ch2' => {
                     '00003' => '/foo/oradata/bar/tools01.dbf',
                     '00006' => '/foo/oradata/bar/foodb-index11.dbf',
                     '00002' => '/foo/oradata/bar/undotbs01.dbf'
                   },
          'ch3' => {
                     '00005' => '/foo/oradata/bar/xml01.dbf',
                     '00009' => '/foo/oradata/bar/foodb-index01.dbf',
                     '00001' => '/foo/oradata/bar/system01.dbf',
                     '00008' => '/foo/oradata/bar/foodb-data01.dbf'
                   }
);
[download]

My code:

#----------------8<----------------
#!/usr/bin/perl

use strict;
use Data::Dumper;

my %channel_db_files = ();
my $RMAN_NO_OF_CHANNELS = 3;
my $RMAN_RUN_CH_NAME = 'ch';

LOOP: while (<DATA>) {
    chomp;
    
    foreach my $i ( 1 .. ${RMAN_NO_OF_CHANNELS} ) {
        
        if ( /$RMAN_RUN_CH_NAME$i/ ) {
            while (<DATA>) {
                chomp;
                 
                if ( /^input datafile/ ) {
                    my ($file_number, $file_name) = $_ =~ /number=(\d+
+)\s+name=(.*)/;
                    $channel_db_files{"ch$i"}{$file_number} = $file_na
+me;
                } else {
                    redo LOOP;
                }
            }
        }
    }
}


print Dumper \%channel_db_files;


__DATA__
Starting backup at 2011-05-31 02:00:05
channel ch1: starting compressed full datafile backup set
channel ch1: specifying datafile(s) in backup set
input datafile file number=00010 name=/foo/oradata/bar/foodb-lob01.dbf
input datafile file number=00004 name=/foo/oradata/bar/foodb-data02.db
+f
input datafile file number=00007 name=/foo/oradata/bar/undotbs02.dbf
channel ch1: starting piece 1 at 2011-05-31 02:00:06
channel ch2: starting compressed full datafile backup set
channel ch2: specifying datafile(s) in backup set
input datafile file number=00003 name=/foo/oradata/bar/tools01.dbf
input datafile file number=00006 name=/foo/oradata/bar/foodb-index11.d
+bf
input datafile file number=00002 name=/foo/oradata/bar/undotbs01.dbf
channel ch2: starting piece 1 at 2011-05-31 02:00:06
channel ch3: starting compressed full datafile backup set
channel ch3: specifying datafile(s) in backup set
input datafile file number=00008 name=/foo/oradata/bar/foodb-data01.db
+f
input datafile file number=00009 name=/foo/oradata/bar/foodb-index01.d
+bf
input datafile file number=00005 name=/foo/oradata/bar/xml01.dbf
input datafile file number=00001 name=/foo/oradata/bar/system01.dbf
channel ch3: starting piece 1 at 2011-05-31 02:00:07
channel ch1: finished piece 1 at 2011-05-31 02:34:54
#----------------8<----------------
[download]

Example result that I would get:

----- ----------------------------------------
File# File Name
----- ----------------------------------------
1     /foo/oradata/bar/system01.dbf          
2     /foo/oradata/bar/undotbs01.dbf        
3     /foo/oradata/bar/tools01.dbf          
4     /foo/oradata/bar/foodb-data02.dbf        
5     /foo/oradata/bar/xml01.dbf             
6     /foo/oradata/bar/foodb-index11.dbf       
7     /foo/oradata/bar/undotbs02.dbf         
8     /foo/oradata/bar/foodb-data01.dbf       
9     /foo/oradata/bar/foodb-index01.dbf      
10    /foo/oradata/bar/foodb-lob01.dbf
[download]

I will be grateful for your help.
Regards,
--
Scottie

Comment on How-to sort nested hash table? Select or Download Code

Replies are listed 'Best First'.
Re: How-to sort nested hash table? by BrowserUk (Patriarch) on Jun 12, 2011 at 10:27 UTC
Is it possible to sort .. hash ... without using any additional, supplementary arrays? It depends what you want to do with the sorted data? A hash table cannot he sorted directly, they have no ordering. All you can do is sort the keys. For a single level hash, if all you want to do with the sorted results is print them out, then you can do that without storing the sorted keys anywhere (other than in the list returned by sort). But if need to remember the sorted ordering, you have to assign the ordered keys somewhere. Typically to an array. For your multi-level hash, you want to sort the keys of several sub-hashes together, which complicates things. First you need to get all the keys of the subhashes together so they can be sorted. Then you need to obtain the values associated with those keys, and that means remembering which sub-hash each key belongs to. It may (almost certainly is) possible to do this in one pass without creating a temporary array, but it would most likely be horribly complicated and not very efficient. It sometimes makes sense to live with complication for efficiency; or a lack of efficiency for the sake of clarity. It never makes sense to use a complicated and inefficient route just to save a temporary array. Assuming a reasonably small hash, I'd do it this way: { my @keys = map{ keys %$_ } values %channel_db_files;; my @values = map{ values %$_ } values %channel_db_files;; my @order = sort{ $keys[ $a ] <=> $keys[ $b ] } 0 .. $#keys;; print "$keys[ $_ ]\t$values[ $_ ]" for @order;; } 00001 /foo/oradata/bar/system01.dbf 00002 /foo/oradata/bar/undotbs01.dbf 00003 /foo/oradata/bar/tools01.dbf 00004 /foo/oradata/bar/foodb-data02.dbf 00005 /foo/oradata/bar/xml01.dbf 00006 /foo/oradata/bar/foodb-index11.dbf 00007 /foo/oradata/bar/undotbs02.dbf 00008 /foo/oradata/bar/foodb-data01.dbf 00009 /foo/oradata/bar/foodb-index01.dbf 00010 /foo/oradata/bar/foodb-lob01.dbf [download] By the time you exit the block, the temporaries will have gone away anyway, and it will likely be both easier to read and more efficient than any mechanism that avoids temporaries. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: How-to sort nested hash table? by Not_a_Number (Prior) on Jun 12, 2011 at 13:46 UTC
Hi. I'm makng a couple of assumptions, please ignore this post if I'm wrong: (Update: added OP's `<DATA>` to make my code stand alone.) 1) The `<DATA>` snippet that your provided is what you parsed to get your `%channel_db_files` data structure. 2) 'File numbers' are unique, as in the data you provided. In which case, I would suggest that you parse your data differently: my @ordered; while ( <DATA> ) { $ordered[$1] = $2 if /input datafile file number=(\d+) name=(\S+)/ } print "$_\t$ordered[$_]\n" for 1 .. $#ordered; __DATA__ Starting backup at 2011-05-31 02:00:05 channel ch1: starting compressed full datafile backup set channel ch1: specifying datafile(s) in backup set input datafile file number=00010 name=/foo/oradata/bar/foodb-lob01.dbf input datafile file number=00004 name=/foo/oradata/bar/foodb-data02.db +f input datafile file number=00007 name=/foo/oradata/bar/undotbs02.dbf channel ch1: starting piece 1 at 2011-05-31 02:00:06 channel ch2: starting compressed full datafile backup set channel ch2: specifying datafile(s) in backup set input datafile file number=00003 name=/foo/oradata/bar/tools01.dbf input datafile file number=00006 name=/foo/oradata/bar/foodb-index11.d +bf input datafile file number=00002 name=/foo/oradata/bar/undotbs01.dbf channel ch2: starting piece 1 at 2011-05-31 02:00:06 channel ch3: starting compressed full datafile backup set channel ch3: specifying datafile(s) in backup set input datafile file number=00008 name=/foo/oradata/bar/foodb-data01.db +f input datafile file number=00009 name=/foo/oradata/bar/foodb-index01.d +bf input datafile file number=00005 name=/foo/oradata/bar/xml01.dbf input datafile file number=00001 name=/foo/oradata/bar/system01.dbf channel ch3: starting piece 1 at 2011-05-31 02:00:07 channel ch1: finished piece 1 at 2011-05-31 02:34:54 #----------------8<---------------- [download] Ouput: `1 /foo/oradata/bar/system01.dbf 2 /foo/oradata/bar/undotbs01.dbf 3 /foo/oradata/bar/tools01.dbf 4 /foo/oradata/bar/foodb-data02.dbf 5 /foo/oradata/bar/xml01.dbf 6 /foo/oradata/bar/foodb-index11.dbf 7 /foo/oradata/bar/undotbs02.dbf 8 /foo/oradata/bar/foodb-data01.dbf 9 /foo/oradata/bar/foodb-index01.dbf 10 /foo/oradata/bar/foodb-lob01.dbf` [download]	[reply] [d/l] [select]
Re: How-to sort nested hash table? by Marshall (Canon) on Jun 13, 2011 at 00:47 UTC
Sorting the sub-hashes gets a bit messy. I know that it violates your problem statement, but in the interest of simplicity, I would suggest just making an AoA structure at the same time as the hash. The AoA is a lot easier to sort. This does basically double the storage required, but the payback in simplicity is a lot. I simplified your parsing a bit below. And I show my suggested AoA sorted 2 different ways. Perl is very good at sorting and many problems become easier with a sort of something or another. If you just need a printout grouped by "ch", then sorting is a good way. I guess a lot depends upon why you are using a hash and what you are doing with the data, which are things we just don't know. Trading off using more memory for clarity can be a very good decision. Unless these structures are huge, the memory usage may not matter even if you keep both representations. #!/usr/bin/perl -w use strict; use Data::Dumper; my %channel_db_files = (); my @AoAsuggestion; # just a suggestion my $RMAN_NO_OF_CHANNELS = 3; # not clear why this is needed? my $RMAN_RUN_CH_NAME = 'ch'; my $current_ch; while (<DATA>) { $current_ch = $1 if /\s$RMAN_RUN_CH_NAME(\d+)/; my ($num,$name)= $_ =~ /^input datafile.?number=(\d+).?name=(.*)$ +/; if (defined $name) # match succeeded! # $num is defined if $name is defined { # your struct: $channel_db_files{"$RMAN_RUN_CH_NAME$current_ch"}{$num}=$name; #suggestion: push @AoAsuggestion,["$RMAN_RUN_CH_NAME$current_ch",$num+0,$name +]; } } print Dumper \%channel_db_files; print "File#\tFile Name\n"; @AoAsuggestion = sort { $a->[1] <=> $b->[1] }@AoAsuggestion; foreach my $lineref (@AoAsuggestion) { print "$lineref->[1]\t\t$lineref->[2]\n"; } # sort by channel, here just alpha sort, but probably want to split # out the number and do numeric sort on that if more than 10 channels print "\nSorting by channel, file number \n"; @AoAsuggestion = sort { $a->[0] cmp $b->[0] or $a->[1] <=> $b->[1] }@AoAsuggestion; my $spacerTag = $AoAsuggestion[0]->[0]; #for grouping by channel + foreach my $lineref (@AoAsuggestion) { if ($lineref->[0] ne $spacerTag) { print "\n"; $spacerTag = $lineref->[0]; } print "$lineref->[0]\t$lineref->[1]\t$lineref->[2]\n"; } =output $VAR1 = { 'ch2' => { '00003' => '/foo/oradata/bar/tools01.dbf', '00006' => '/foo/oradata/bar/foodb-index11.dbf', '00002' => '/foo/oradata/bar/undotbs01.dbf' }, 'ch1' => { '00010' => '/foo/oradata/bar/foodb-lob01.dbf', '00004' => '/foo/oradata/bar/foodb-data02.dbf', '00007' => '/foo/oradata/bar/undotbs02.dbf' }, 'ch3' => { '00005' => '/foo/oradata/bar/xml01.dbf', '00009' => '/foo/oradata/bar/foodb-index01.dbf', '00001' => '/foo/oradata/bar/system01.dbf', '00008' => '/foo/oradata/bar/foodb-data01.dbf' } }; File# File Name 1 /foo/oradata/bar/system01.dbf 2 /foo/oradata/bar/undotbs01.dbf 3 /foo/oradata/bar/tools01.dbf 4 /foo/oradata/bar/foodb-data02.dbf 5 /foo/oradata/bar/xml01.dbf 6 /foo/oradata/bar/foodb-index11.dbf 7 /foo/oradata/bar/undotbs02.dbf 8 /foo/oradata/bar/foodb-data01.dbf 9 /foo/oradata/bar/foodb-index01.dbf 10 /foo/oradata/bar/foodb-lob01.dbf Sorting by channel, file number ch1 4 /foo/oradata/bar/foodb-data02.dbf ch1 7 /foo/oradata/bar/undotbs02.dbf ch1 10 /foo/oradata/bar/foodb-lob01.dbf ch2 2 /foo/oradata/bar/undotbs01.dbf ch2 3 /foo/oradata/bar/tools01.dbf ch2 6 /foo/oradata/bar/foodb-index11.dbf ch3 1 /foo/oradata/bar/system01.dbf ch3 5 /foo/oradata/bar/xml01.dbf ch3 8 /foo/oradata/bar/foodb-data01.dbf ch3 9 /foo/oradata/bar/foodb-index01.dbf =cut __DATA__ Starting backup at 2011-05-31 02:00:05 channel ch1: starting compressed full datafile backup set channel ch1: specifying datafile(s) in backup set input datafile file number=00010 name=/foo/oradata/bar/foodb-lob01.dbf input datafile file number=00004 name=/foo/oradata/bar/foodb-data02.db +f input datafile file number=00007 name=/foo/oradata/bar/undotbs02.dbf channel ch1: starting piece 1 at 2011-05-31 02:00:06 channel ch2: starting compressed full datafile backup set channel ch2: specifying datafile(s) in backup set input datafile file number=00003 name=/foo/oradata/bar/tools01.dbf input datafile file number=00006 name=/foo/oradata/bar/foodb-index11.d +bf input datafile file number=00002 name=/foo/oradata/bar/undotbs01.dbf channel ch2: starting piece 1 at 2011-05-31 02:00:06 channel ch3: starting compressed full datafile backup set channel ch3: specifying datafile(s) in backup set input datafile file number=00008 name=/foo/oradata/bar/foodb-data01.db +f input datafile file number=00009 name=/foo/oradata/bar/foodb-index01.d +bf input datafile file number=00005 name=/foo/oradata/bar/xml01.dbf input datafile file number=00001 name=/foo/oradata/bar/system01.dbf channel ch3: starting piece 1 at 2011-05-31 02:00:07 channel ch1: finished piece 1 at 2011-05-31 02:34:54 #----------------8<---------------- [download]	[reply] [d/l]