Dividing Tree::Nary tree into subtrees

Amphiaraus has asked for the wisdom of the Perl Monks concerning the following question:

I now have a perl program that successfully turns input files of Clear Case version-extended-paths for directories, of the form:

/vobs/cdma_linux_apps/code/make@@/main/par_x_haloc9_s74.47.9/bld_01.06.00_ca25_krmt43_haloc/1
/vobs/cdma_linux_apps/code@@/main/par_x_haloc9_s74.47.9/bld_01.06.00_ca25_krmt43_haloc/1
/vobs/cdma_linux_apps/code/libs@@/main/par_x_haloc9_s74.47.9/bld_01.06.00_ca25_krmt43_haloc/1

Into a Tree::Nary tree. Each node of this tree holds information on that node's directory element. A node holding "/vobs" plus associated data is the Tree's root.

Each node of my Tree::Nary tree has (as its data) a pointer to a hash containing the following information on a directory:

# Holds name of directory with CR branches/version number removed
$hash_data{dir_name} = a string

# Holds name of directory with CR branches/version number retained
$hash_data{dir_vep} = a string

# TRUE if the directory requires a merge from the "build" branch to the official product branch
$hash_data{needs_merge} = a boolean

# Total merges to the product branch needed, for this directory AND for directories in its subtree
$hash_data{merge_count} = an integer

# Tree::Nary address at which the node for this directory is stored.
$hash_data{node_address}

If I was processing this directory:

/vobs/engine_cdma2000/code/ghdr

I create Tree::Nary nodes for:

"/vobs"
"/vobs/engine_cdma2000"
"/vobs/engine_cdma2000/code"
"/vobs/engine_cdma2000/code/ghdr"

Has anyone considered the problem of breaking up Tree::Nary trees into subtrees? Can you provide me some hints on what algorithms might be used for dividing into smaller trees a Tree::Nary tree such as I have described?

If I had an input file of 200 directories, I would like to place them in my Tree::Nary tree (which my program can do now), and then use the merge_count and node_address data from each node to split up these 200 directories into smaller groups of perhaps 20 to 40 directories per group, so that each group of directories could be merged by my script on a seperate process.

Each new directory version gives visibility to any new elements it contains, so under no conditions would a directory Y that was a subdirectory of directory X be merged prior to X.

If X and Y are in the same input file, X and Y must both be assigned to the same group (which would have a top-down sort) to insure all directory merges happened in a top down order -- with no race condition as would occur if directories X and Y were accidentally assigned to 2 seperate groups.

I hope this question is detailed enough, and shows clearly that I am asking for help on a program that I have already taken part of the way to its goal -- so that I will not receive too severe a penance from the Perl Monks for submitting it!

Thanks,
Amphiaraus

PS in case it is of interest here is the code of my nary_tree_testing.pl Perl program. The problem of dividing my N-ary tree (of all nodes containing drectories needing merge to the product branch) into smaller trees of 20-40 directories each (to be merged in parallel once these smaller groups are created) appears global -- I can't think of any part of this program to extract since all functions take part in N-ary tree creation and printout


#!/vobs/atso_tools/perl5/bin/perl -w
# LATEST VERSION SEPT 18, 2008
# WILL BE ADDED TO COPY_MERGE WHEN COMPLETED

# This home UNIX account directory contains modules
# downloaded from www.cpan.org
use lib "/home/ark014/bin/trees";

# COMMENTS
# 1) Empty line in dirs.txt crashes script
# 2) Must be able to deal with /vobs/x/y vobs and with linked vobs
# 3) Why is integration_branch_directories_sorted.txt present but empt
+y?

# For N-ary trees
use Tree::Nary;

# For splitdir
use File::Spec;

use strict;
use English;
use strict;
use Env;
use integer;
use File::Basename;
use File::Path;
use File::Copy;
use File::Spec;

# TO DO: make as many of these global variables local as possible

my $parent_node;

use constant TRUE => 1;
use constant FALSE => 0;

my @dir_array;
my @dir_sort;
my @dir_name;

# Roots of an N-ary tree of directories 
my $root_advanced;

# Node of an N-ary tree
my $node;

my $dir_list;
my $dir_list_sorted;
my $dir_name_list;
my $label;
my $pwd;

# Hash of hash pointers, each is a pointer to
# a hash representing a directory element  
my %directories_hash;

######################################################################
+#########
# NAME:        init_dirs
#
# DESCRIPTION: 1) Define test label and test directory. 
#              2) Define variables associated with the input file, 
#                 integration_branch_directories.txt, and a file
#                 of sorted directories (latter is intended to speed 
#                 construction of N-ary tree).  
######################################################################
+#########

sub init_dirs 
{
   $label = "HALOC9_X_01.06.00R_NARY"; 
   $pwd = $ENV{'HOME'}."/builds/".$label;
   $dir_list = "$pwd/integration_branch_directories.txt";
   $dir_list_sorted = "$pwd/integration_branch_directories_sorted.txt"
+;
}

######################################################################
+#########
# NAME:        create_dir_arrays
# 
# DESCRIPTION: 1) Open pointers to input file and sorted dirs file 
#              2) Create arrays of directory versions, and of sorted 
#                 directory versions
#              3) Also create "dir_names", an array (sorted) of direct
+ories
#                 which have had "@@/main/<dir_path>/bld_branch/ver_nu
+m"
#                 stripped away    
######################################################################
+#########

sub create_dir_arrays 
{
   my $dir_vep;
   my $dir_name;
   my $dir_name2;
   my $debug1 = TRUE;
   my $debug2 = FALSE;
   my $dir;

   open DIRLIST, "$dir_list" or die "Cannot open $dir_list";
   chomp (@dir_array = (<DIRLIST>));
   close DIRLIST;

   open DIRLIST, "$dir_list" or die "Cannot open $dir_list";
   chomp (@dir_sort = (<DIRLIST>));
   close DIRLIST;

   open DIRSORT, ">>$dir_list_sorted" or die "Cannot open $dir_list_so
+rted";
   @dir_sort = sort{$b cmp $a}(@dir_sort);
   close DIRSORT;

   if($debug2){ 
      print "Contents of dir_array\n";

         foreach $dir (@dir_array){
            print "$dir";
            print "  ";
      }
   }

   if($debug2){
      print "\n\nContents of dir_sort\n"; 
         foreach $dir (@dir_sort){
            print "$dir";
            print "  ";
         }
   }

   # To construct a tree of dirs need to get rid of
   # "@@/main/<dir_path>/bld_branch/ver_num"
   # This @dir_name array will hold them 
    foreach $dir_vep (@dir_sort) {
       $dir_name = $dir_vep;
       $dir_name  =~ s/^(.*)@@.*/$1/;
       push(@dir_name, $dir_name);
   } 

   if($debug1){
      print "\n\nContents of dir_name\n";
      foreach $dir (@dir_name){ 
            print "$dir";
            print "  ";
      }
   print "\n\n";
   }
}

######################################################################
+#########
# NAME:        print_hash_contents
#
# DESCRIPTION: 1) Prints out contents of a hash, from the array of has
+h 
#                 addresses initialized in sub create_nary_dir_hash
#              2) Used for debugging
######################################################################
+#########


sub print_hash_contents
{
   my $ref_to_hash = shift;
   my %hash_data = %$ref_to_hash;
   my $debug1 = FALSE;

   if ($debug1) {
      print "In print_hash_contents, input is $ref_to_hash\n";
   }
   
   print "PRINT_HASH: hash's dir_name holds $hash_data{dir_name}\n";
   print "PRINT_HASH: hash's dir_vep holds $hash_data{dir_vep}\n";
   print "PRINT_HASH: hash's needs_merge holds $hash_data{needs_merge}
+\n";
   print "PRINT_HASH: hash's merge_count holds $hash_data{merge_count}
+\n";
   print "PRINT_HASH: hash's node_address holds $hash_data{node_addres
+s}\n";
}



######################################################################
+#########
# NAME:        set_node_address 
#
# DESCRIPTION: 1) Init the {node_address} field of a Tree Node's data 
+hash
#                 with the address of the Tree Node. 
#              2) The Tree Node's data hash is initialized with a dumm
+y value
#                 of "0". It needs to be updated with an actual node a
+ddress. 
######################################################################
+#########

sub set_node_address 
{
   my $node_address = shift;
   my $debug1 = TRUE;
   my $debug2 = FALSE;

   if ($debug1) {
      print "In set_node_address, input is $node_address\n\n";
   }

   my $node_data_address = $node_address->{data};
   my %node_hash_data = %$node_data_address;

   ### my $directory_hash_address = $directories_hash{$node_hash_data{
+dir_name}};
   ### my %directory_hash_data = %$directory_hash_address; 

   if ($debug2) {
      print "SET_NODE_ADDRESS BEFORE: node's data field holds:\n"; 
      &print_hash_contents($node_address->{data});
      print "SET_NODE_ADDRESS BEFORE: directories_hash entry for this 
+dir holds:\n";
      &print_hash_contents($directories_hash{$node_hash_data{dir_name}
+});
   }

   ### $node_hash_data{node_address} = $node_address;
   ### $node_address->{data} = \%node_hash_data;

   $directories_hash{$node_hash_data{dir_name}}->{node_address} = $nod
+e_address;
  
   if ($debug2) { 
      print "SET_NODE_ADDRESS AFTER: node's data field holds:\n"; 
      &print_hash_contents($node_address->{data});
      print "SET_NODE_ADDRESS AFTER: directories_hash entry for this d
+ir holds:\n";
      &print_hash_contents($directories_hash{$node_hash_data{dir_name}
+});
   }
  
   if ($debug1) {
      print "$directories_hash{$node_hash_data{dir_name}}->{dir_name} 
+node now has {node_address} $directories_hash{$node_hash_data{dir_nam
+e}}->{node_address}\n"
   }
 
}

######################################################################
+#########
# NAME:        get_node_address
#
# DESCRIPTION: 1) Returns "0" if $directories_hash{$key} {node_address
+} data  
#                 field holds the dummy value of "0" it was given in 
#                 create_nary_dir_hash (this means the Tree::Nary node
+ for the 
#                 current dir has been created, but not yet added to t
+he N-ary 
#                 Tree). 
#              2) Returns the node address of the current dir's  Tree 
+Node  
#                 from $directories_hash{$key} if this {node_address} 
+value is
#                 nonzero (this means the Tree::Nary node was previous
+ly added 
#                 to the N-ary tree). 
######################################################################
+#########

sub get_node_address
{
   my $dir_hash_address = shift;
   
   my $debug1 = TRUE;

   my %hash_data = %$dir_hash_address;

   if ($debug1) {
      print "In get_node_address, input is $dir_hash_address\n";
      print "In get_node_address, {node_address} of this dir_hash hold
+s $hash_data{node_address}\n";
   }

   if ($hash_data{node_address} == 0) {
      return 0;
   } else {
      return ($hash_data{node_address});
   }
}

######################################################################
+#########
# NAME:        print_node_contents 
#
# DESCRIPTION: 1) Prints out contents of the data field of a N-ary Tee
+ node 
#              2) Used for debugging 
######################################################################
+#########

sub print_node_contents
{
   my $ref_to_node = shift;
   my $debug1 = TRUE;

   if ($debug1) {
      print "In print_node_contents, input is $ref_to_node\n";
   }
   
   my $node_data = $ref_to_node->{data};
   my %hash_data = %$node_data;
   
   print "PRINT_NODE: node's dir_name holds $hash_data{dir_name}\n";
   print "PRINT_NODE: node's dir_vep holds $hash_data{dir_vep}\n";
   print "PRINT_NODE: node's needs_merge holds $hash_data{needs_merge}
+\n";
   print "PRINT_NODE: node's merge_count holds $hash_data{merge_count}
+\n"; 
   print "PRINT_NODE: node's node_address holds $hash_data{node_addres
+s}\n\n";
}

######################################################################
+#########
# NAME:        create_nary_dir_hash 
#
# DESCRIPTION: 1) Create a hash holding indformation on each entry in 
+dirs.txt
#              2) This hash will later be used as a database to help c
+onstruct 
#                 an N-ary Tree using Tree::Nary.
#              3) The hash is called %directories_hash. It is a hash o
+f pointers
#                 to smaller hashes I call dir_hashes. 
#              4) Each dir_hash holds the following fields: dir_name, 
+dir_vep,
#                 needs_merge, merge_count. It will be much quicker to
+ do this 
#                 initializing in a hash, especially for merge_count w
+hich would
#                 otherwise require several searches of, and visits to
+, a tree 
#                 node whose merge_count was being incremented.
#              5) Later during the tree construction function, another
+ field 
#                 will be added to each tree hash, "tree-address", giv
+ing the
#                 address of its node once its node has been added to 
+the N-ary 
#                 tree. 
######################################################################
+#########

sub create_nary_dir_hash
{
   # Local variables
   my $dir_name; 
   my $dir;
   my $j; 

   #  Will be used to construct a dir path
   my $dir_tmp = "";
 
   my $debug1 = TRUE;
   my $debug2 = FALSE;

   # Helps iterate through @dir_name 
   my $dir_name_counter = 0;

   # This array holds tokenized version of current @dir_name entry, un
+ique 
   # for each loop.
   # Example: /vobs/engine_cdma2000/code becomes an array:
   # [0] empty string [1] "vobs"  [2] "engine_cdma2000"  [3] "code"
   # splitdir array ALWAYS has an empty string at [0]
   my @splitdir_curr = ();

   # Loop through @dir_name (these versions have had 
   # @@/main/<branches>/bld_branch/ver_num removed
   foreach $dir_name (@dir_name) { # @dirname foreach loop - top

      if($debug1){ 
         print "CREATE HASH: dir_name is $dir_name, dir_name_counter i
+s $dir_name_counter\n\n"; 
      } 

      @splitdir_curr = File::Spec->splitdir($dir_name);

      # Loop through tokens that represent dir path of the current @di
+r_name entry 
      # Note: $splitdir_curr[0] is "" so $j starts at "1"
     
      for ($j = 1; $j <= $#splitdir_curr; $j++) {  # @splitdir_curr lo
+op - top
         
         $dir = $splitdir_curr[$j]; 

         # To be certain that node_data hash is unique in each interio
+r loop
         # Example: If directory version was /vobs/engine_cdma2000/cod
+e, three 
         # %node_data hashes for potential tree nodes are needed: one 
+holds 
         # "/vobs', the second holds "/vobs/engine_cdma2000", the thir
+d holds 
         # "/vobs/engine_cdma2000/code"
         # %node_data holds data for the N-ary tree node that will eve
+ntually be 
         # created
         # In each loop a new %node_data is created, with same variabl
+e name but 
         # unique address 
     
         my $dir_hash_ref; 

         { 
            my %dir_hash_unique;
            $dir_hash_ref = \%dir_hash_unique;
         }

         my %dir_hash = %$dir_hash_ref;

         if($debug2) { 
            print "CREATE_HASH: In splitdir loop, dir is $dir, counter
+ is $j\n\n";
         } 

         # Is this the rightmost dir in this entry's dir path? If so:
         # a) {dir_vep} is not NULL, and is obtained from $dir_sort[$c
+ounter] 
         # b) {needs_merge} is always TRUE, 
         # c) {dir_name} is obtained from $dir_name
         # d) {merge_count} is always 1
         # e) {node_address} is here initialized to dummy value of 0, 
+will later 
         #    get node's address
         # f) This hash is guaranteed to be added to %directories_hash
+ 
         
         if($j == $#splitdir_curr) {  #  <-- Processing rightmost dir 
            if ($debug2) { 
               print "CREATE_HASH: current dir is rightmost dir\n\n";
            }

            # REPLACE THIS WITH A "SET" FUNCTION
            # Merge candidate always has nonempty dir_vep
            $dir_hash{"dir_vep"} = $dir_sort[$dir_name_counter]; 
            $dir_hash{"dir_name"} = $dir_name;
            # Merge candidate always has needs_merge = TRUE
            $dir_hash{"needs_merge"} = TRUE; 
            # Merge candidate always has merge_count = 1
            $dir_hash{"merge_count"} = 1; 
            # Will receive address of its related Tree node later 
            $dir_hash{"node_address"} = 0;

            # Empty this string for next loop 
            $dir_tmp = "";

            # a) If this directory was earlier added to the directory 
+hash (as a 
            #    non-rightmost item) this version's address will be co
+pied over the 
            #    earlier hash item in the directory hash (since existi
+ng entry for 
            #    this dir in the hash of hash pointers is incorrect).
            # b) If this directory does not yet exist in the directory
+ hash, a 
            #    new hash entry will be created, holding this version'
+s address 
            
            $directories_hash{$dir_hash{"dir_name"}} = \%dir_hash;   

            if ($debug1) { 
               print "CREATE_HASH: Have added a rightmost element to d
+irectories_hash\n"; 
               print "Its key is $dir_hash{dir_name}, its address is $
+directories_hash{$dir_hash{dir_name}}\n";
               &print_hash_contents($directories_hash{$dir_hash{dir_na
+me}});
               print "\n";
            }

         } else {  #  <-- Current dir is NOT the rightmost dir, start 
+else
             # If this is NOT the rightmost dir in this entry's dir pa
+th
             # a) "dir_vep" is "NULL", unless this element was rightmo
+st 
             #    elsewhere in dirs.txt 
             # b) "needs_merge" is FALSE, unless this element was righ
+tmost 
             #    elsewhere in dirs.txt 
             # c) "dir_name" will be built up from splitdirs array 
             # d) If this hash entry does not exist in the hash of has
+h 
             #    pointers, initialize "merge_count" to "1". If it alr
+eady exists, 
             #    increment "merge_count" by "1"
            
             if ($debug2) { 
                print "CREATE_HASH: current dir is NOT rightmost dir\n
+\n";
             }

             if ( $j == 1 ) {  
                $dir_tmp = "/vobs";
             } else { 
                $dir_tmp = $dir_tmp . "/" . $splitdir_curr[$j];
             } # 

             $dir_hash{dir_name} = $dir_tmp;

             # Is the dir represented by $dir_hash already in the hash
+ of 
         # hash pointers? 
             if (exists $directories_hash{$dir_hash{dir_name}}) { # IF
+ IT EXISTS
                # All I need to do is increment "merge_count" since ot
+her data 
                # for this dir has been initialized earlier

                $directories_hash{$dir_hash{dir_name}}->{merge_count} 
++= 1; 

                if ($debug1) {
                   print "CREATE HASH: Have incremented an existing el
+ement in directories_hash\n";
                   print "Its key is $dir_hash{dir_name}, its address 
+is $directories_hash{$dir_hash{dir_name}}\n";
                   &print_hash_contents($directories_hash{$dir_hash{di
+r_name}});
                   print "\n";
                }

             } else { # IF IT DOESN'T EXIST 
                # Must initialize all fields if dir is not present in 
+hash of 
        # hash pointers, then add to %directories_hash 

                # REPLACE THIS WITH A "SET" FUNCTION
                $dir_hash{"dir_vep"} = "NULL";
                $dir_hash{"needs_merge"} = FALSE; 
                $dir_hash{"merge_count"} = 1;
        # Will receive address of its related node later
                $dir_hash{"node_address"} = 0; 
                $directories_hash{$dir_hash{"dir_name"}} = \%dir_hash;

               if ($debug1) {
                  print "CREATE_HASH: Have added a non-rightmost eleme
+nt to directories_hash\n";
                  print "Its key is $dir_hash{dir_name}, its address i
+s $directories_hash{$dir_hash{dir_name}}\n";
                  &print_hash_contents($directories_hash{$dir_hash{dir
+_name}});
                  print "\n";
               }

             } # IF IT DOESN'T EXIST 
         } #  <-- Current dir is not the rightmost dir, end else
            
      } # @splitdir_curr foreach loop - end    
          
      # Go to the next directory from @dir_name
      $dir_name_counter++;
   } # @dirname foreach loop - bottom
   
   if ($debug1) { 
      print "\nCREATE_HASH: Print out contents of the hash of hash poi
+nters\n\n";   

      foreach (keys %directories_hash) {  # foreach
            &print_hash_contents($directories_hash{$_});  
            print "\n\n";
      }   # foreach    
  } 
}  

######################################################################
+#########
# NAME:        create_nary_dir_tree 
#
# DESCRIPTION: 1) Create an N-ary Tree using Tree::Nary 
#              2) sub create_nary_dir_hash has created a hash of hash 
+pointers 
#                 called % directories_hash. #%directories_hash holds 
+pointers 
#                 to smaller hashes each of which is a representation 
+of a 
#                 directory element seen in dirs.txt, as well as direc
+tories 
#                 seen along the dir path of that directory element.
#              3) Initializing of all but one of the data fields of ea
+ch dir 
#                 hash (dir_vep, dir_name, needs_merge, and merge_coun
+t) are 
#                 completed in create_nary_dir_hash
#              4) The directories hash contains the following informat
+ion on 
#                 each directory: dir_name, dir_vep, needs_merge, merg
+e_count, 
#                 node_address. The %directories_hash is used by this 
+function 
#                 as a DATABASE to tell if a given node is in the N-ar
+y tree, 
#                 and also to obtain the data needed to populate an N-
+ary node.
#              5) Data field of each Tree::Nary node is a pointer to a
+ hash 
#                 containing the data describing its directory element
+. These 
#                 nodes also have "parent" and "children" pointers. "c
+hildren" 
#                 is a pointer to a node's leftmost child. Each child 
+node has a 
#                 "next" pointer pointing to its sibling. 
#              6) When the node representing a given directory element
+ has been 
#                 added to the N-ary tree, the address of this node is
+ saved 
#                 into its entry in %directories_hash. Therefore, to d
+etermine 
#                 if a node is in the tree, a quick access of %directo
+ries_hash 
#                 will be done, and not a slow search of the N-ary tre
+e, to see 
#                 if the "node_address" data field for that directory 
+is empty 
#                 or nonempty in the hash.  
######################################################################
+#########
sub create_nary_dir_tree
{
   # Local variables
   my $dir_name; 
   my $dir;

   my $test;
   
   # For looping through directories array 
   my $dir_name_counter = 0;

   my $node_data_address;

   my $j; 

   my $key;
   
   my $debug1 = TRUE;
   my $debug2 = FALSE;

   #  Will be used to construct a dir path
   my $dir_tmp = "";

   my $node_data_ref;

   # Tokenized version of current @dir_name entry, unique for each loo
+p.
   # Example: /vobs/engine_cdma2000/code becomes an array:
   # [0] "" [1] "vobs"  [2] "engine_cdma2000"  [3] "code"
   # splitdir array ALWAYS has an empty string at [0]

   my @splitdir_curr = ();

   my $search;

   # Creating the root of the N-ary Tree
   my $root_node_hash_ref = $directories_hash{"/vobs"};

   # Create a Tree::Nary tree's root with ref to the above hash 
   # %root_node_hash_ref as its data
   $root_advanced = Tree::Nary->new($root_node_hash_ref);

   # Update root node's {node_address} field from dummy value of "0" t
+o address
   # of its N-ary Tree node 
   &set_node_address($root_advanced);

   print "\nCREATE NODE: Printing contents of root\n"; 
   &print_node_contents($root_advanced);

   # At start, parent_node = root
   $parent_node = $root_advanced;

   # Loop through @dir_name (these versions have had 
   # @@/main/<branches>/bld_branch/ver_num removed
   foreach $dir_name (@dir_name) { # @dirname foreach loop - top

      # Will be used in this loop to help tree construction
      # $parent_node is the root node, holding "/vobs", at start of th
+is loop 
      # Each loop processes a directory version from @dir_name
      
      if($debug1){ 
         print "CREATE_NODE: dir_name is $dir_name\n\n"; 
      } 

      @splitdir_curr = File::Spec->splitdir($dir_name);

      # Loop through tokens that represent dir path of the current @di
+r_name entry 
      # Note: $splitdir_curr[0] is "", so $j starts at "1"
      
      for ($j = 1; $j <= $#splitdir_curr; $j++) {  # @splitdir_curr lo
+op - top
         
         $dir = $splitdir_curr[$j]; 

         # To be certain that node_data hash is unique in each interio
+r loop
         # Example: If directory version was /vobs/engine_cdma2000/cod
+e, 
         # three %node_data hashes for potential tree nodes are needed
+: one 
         # holds "/vobs', the second holds "/vobs/engine_cdma2000", th
+e third 
         # holds "/vobs/engine_cdma2000/code"
         # %node_data holds data for the N-ary tree node that will be 
+created
         # In each loop a new %node_data is created, with same variabl
+e name but 
         # unique address 
     
         if ($debug2) { 
            print "CREATE_NODE: dir is $dir\n";
            print "CREATE_NODE: counter is $j\n\n";
         } 

         # Is this the rightmost dir in this entry's dir path? If so:
         
         # a) {dir_vep} is not NULL  
         # b) {needs_merge} is TRUE 
         # c) {dir_name} is the same as $dir_name
         # d) {merge_count} is "1"
         # e) {node_address} currently has dummy value of "0", will la
+ter get 
     #    node's actual address
         # f) This node is guaranteed to be added to the tree
         
         if($j == $#splitdir_curr) {  #  <-- Processing rightmost dir
            
            if ($debug2) { 
               print "CREATE_NODE: current dir $dir is the rightmost d
+ir\n";
            }

            $key = $dir_name;
            
            # Data hash representing this "rightmost dir" was initiali
+zed 
            # earlier in sub create_nary_dir_hash
            # Save address of this hash (taken from $directories_hash{
+$key}) 
            # into $node_data_ref 
        
            my $node_data_ref = $directories_hash{$key};

            if ($debug1) {
               print "node_data_ref holds address $node_data_ref\n";
            }
             
            # Have reached end of inner "splitdir_curr" loop 
            # Empty these 2 variables to allow correct processing of n
+ext entry 
        # from @dirs_array 
            
            $dir_tmp = "";
            @splitdir_curr = ();

         } else {  #  <-- Current dir is NOT the rightmost dir, start 
+else
            
            # a) {dir_vep} is "NULL", unless this element was rightmos
+t 
            #    elsewhere in dirs.txt  
            # b) {needs_merge} is FALSE, unless this element was right
+most 
            #    elsewhere in dirs.txt  
            # c) {dir_name} will be built up from splitdirs array 
            # d) {merge_count} will have been set to its correct value
+ in sub 
            #    create_nary_dir_hash
            # e) {node_address} currently has dummy value of "0", will
+ later get 
            #    node's actual address
            # f) This node MAY have been added to tree earlier if its 
+element 
            #    was rightmost elsewhere in @dir_name, or if it appear
+ed in a 
            #    previous line of @dir_name as a "non-rightmost" dir.
        
            if ($debug2) { 
               print "CREATE_NODE: current dir $dir is NOT the rightmo
+st dir\n";   
            } 
            
            # Will build up the name of this non-rightmost dir in $dir
+_tmp 
            if ( $j == 1 ) {  
               $dir_tmp = "/vobs";
            } else {
               $dir_tmp = $dir_tmp . "/" . $splitdir_curr[$j];
            } 

           # For non-rightmost dir, name of dir is a substring of $dir
+_name, 
       # saved into $dir_tmp 
           $key = $dir_tmp;

         } #  <-- Current dir is NOT the rightmost dir, end else
            
         # Data hash representing this dir was initialized earlier in 
+sub 
         # create_nary_dir_hash
         # Save address of this hash (taken from $directories_hash{$ke
+y}) 
         # into $node_data_ref
         # If current dir was rightmost, $key is $dir_name, 
         # otherwise $key is $dir_tmp

         $node_data_ref = $directories_hash{$key};
 
         if ($debug1) { 
            print "\nCREATE_NODE: Latest node_data_ref now holds its c
+orresponding directories_hash entry\n"; 
            print "key is $key, node_data_ref is $node_data_ref\n";
            &print_hash_contents($node_data_ref);
         }

         # Is there a node in the tree that holds same data as $node_d
+ata_ref?
         # To save time instead of searching the tree, or walking thro
+ugh the 
         # tree (with "children" and "next" pointers), will use 
         # %directories_hash as a database, performing a lookup using 
         # $key = name of directory, and see if the {node_address} of 
+the hash 
         # entry corresponding to this directory holds a dummy value o
+f "0" 
         # (means directory is NOT in tree) or holds a nonzero value f
+or 
         # {node address} (means directory is in tree).   
     
         print "\nPROCESSING CURRENT NODE: searching for node in tree 
+with same data as current node\n";
         print "directories_hash{$key} holds address $directories_hash
+{$key}\n\n"; 

         $search = &get_node_address($directories_hash{$key});

         if($search != 0) { # search_result = FOUND 
                  
            # $parent_node variable will hold the address of the most 
+recently 
            # found node
            # If this dir is already in tree, all that needs to be don
+e is to 
            # update $parent_node
          
            if ($debug1) {
               print "node holding $key is already in tree\n";
            } 
            
            $parent_node = $search;

         } else { # else search_result = NOT FOUND starts

            # Must add the current node to tree, using $parent_node sa
+ved in 
            # previous loop's search
            # Tree::Nary->append will make the new node the rightmost 
+child of 
            # $parent_node, or else give it
            # its first child if it has none

            if($debug1){
                 print "\nAdding a node to the N-ary tree\n\n";
            }

            # This conmmand instantiates both "rightmost" 
        # and "non-rightmost" nodes 
            my $new_node = Tree::Nary->new($node_data_ref);

            # Updating this directory's {node_address} field with its 
            # node address 
            # Earlier, in sub create_nary_dir_hash, this dir's {node_a
+ddress} 
            # field was initialized with a dummy value 
        
            &set_node_address($new_node);

            # Must update the {node_address} field of the hash contain
+ed in 
            # Tree::Nary object $new_node with the actual address of n
+ew node. 
            # Currently this field (initialized earlier in hash functi
+on) 
            # contains dummy value of "0". Use scoping to insure these
+ variables
            # are unique

            if ($debug1) { 
               print "ADD_NODE: Appending, parent_node address is $par
+ent_node, new_node address is $new_node\n"; 
            }                
                  
            Tree::Nary->append($parent_node, $new_node);

            # Was the new_node successfully added? 
            $search = &get_node_address($directories_hash{$key});

            if ( $search != 0) {
               print "\nAPPEND SUCEEDED, NODE FOUND\n";
            } else {               
               print "\nAPPEND FAILED, NODE NOT FOUND\n";
               exit 1;
            }

            if ($debug2) { 
               print "\n\nAFTER add node, contents of tree were:\n\n";
+ 
               &print_nary_dir_tree($root_advanced);
               print "\n\n";
            }

            # Most recently added node, representing a "dir" from a gi
+ven 
        # dir_path becomes the new "parent"
            $parent_node = $new_node; 
         } # else search_result = NOT_FOUND  ends
         
      } # @splitdir_curr foreach loop - bottom
      
   # $dir_name_counter increment will cause next entry in @dir_name ar
+ray 
   # to be processed in the outer foreach loop
   $dir_name_counter++;
   
   } # @dir_name foreach loop - bottom

}

######################################################################
+#########
# NAME:        split_nary_dir_tree
#
# DESCRIPTION:
######################################################################
+#########


sub split_nary_dir_tree
{
   my $root = shift;
   my $depth = -1;
   my $debug1 = TRUE;

   if ($debug1) {
      print "\n\nEntering sub split_nary_dir_tree\n";
   }

   my $find_merge_count = sub { # start of sub printsub
      my $node = shift;
      my %hash_data;
      my $node_data;

      if (defined $node) {
         $node_data = $node->{data};
         %hash_data = %$node_data;
      } else {
         print "ERROR: Node is undefined, split_nary_dir_tree has fail
+ed\n";
         return 1;
      }
   
      if (defined $node_data) {
         if ($debug1) { 
            print "PRINTSUB: hash_data{dir_name} holds $hash_data{dir_
+name}\n"; 
            print "PRINTSUB: hash_data{merge_count} holds $hash_data{m
+erge_count}\n";
            print "PRINTSUB: hash_data{node_address} holds $hash_data{
+node_address}\n\n";
         }
         #return 0; 
         return ($hash_data{merge_count}, $hash_data{node_address});
      } else {
         print "PRINTSUB: node_data is undefined, print_nary_dir_tree 
+has failed\n";
         return 1;
      }   
   };

   Tree::Nary->traverse($root, $Tree::Nary::PRE_ORDER, $Tree::Nary::TR
+AVERSE_ALL, $depth, $find_merge_count);

}

######################################################################
+#########
# NAME:        print_nary_dir_tree
#
# DESCRIPTION: 1) N-ary Tree has all its nodes visited and printed out
+ 
#              2) Contents of nodes' data fields are printed out 
#              3) print_nary_dir_tree has a helper function embedded w
+ithin it
#                 called "print_sub", which is used to print out the d
+ata field
#                 in each node visited during the N-ary Tree traversal
+ 
######################################################################
+#########


sub print_nary_dir_tree 
{
   my $root = shift;
   my $debug1 = TRUE;
 
   if ($debug1) {
      print "\n\nEntering sub print_nary_dir_tree\n"; 
   }

      my $printsub = sub { # start of sub printsub
         my $node = shift;
         my %hash_data;
         my $node_data;
         my %hash_data_p;
         my $node_data_p;
         my $parent_address = $node->{parent};  
     
      if (defined $node) {
         $node_data = $node->{data};
         %hash_data = %$node_data;

         if(defined $parent_address) { 
            $node_data_p = $parent_address->{data};
            %hash_data_p = %$node_data_p;
         }
      } else {
         print "ERROR: Node is undefined, print_nary_dir_tree has fail
+ed\n";
      }
        
      if (defined $node_data) {
         if(defined $parent_address) {
            print "\nPRINTSUB: This node's parent is $hash_data_p{dir_
+name}\n"; 
         } else {
            print "\nPRINTSUB: this is the root node\n";
         }
         print "PRINTSUB: hash_data{dir_name} holds $hash_data{dir_nam
+e}\n";
         print "PRINTSUB: hash_data{dir_vep} holds $hash_data{dir_vep}
+\n";
         print "PRINTSUB: hash_data{needs_merge} holds $hash_data{need
+s_merge}\n";
         print "PRINTSUB: hash_data{merge_count} holds $hash_data{merg
+e_count}\n";
         print "PRINTSUB: hash_data{node_address} holds $hash_data{nod
+e_address}\n\n";
      } else {
         print "PRINTSUB: node_data is undefined, print_nary_dir_tree 
+has failed\n";
      }  
       
      # Note: "print" command returns "1", to allow traverse to cover 
+entire 
      # tree must have a "return 0" here. Otherwise would stop after p
+rinting one node. 
      return 0;
      }; # end of sub printsub

   Tree::Nary->traverse($root, $Tree::Nary::PRE_ORDER, $Tree::Nary::TR
+AVERSE_ALL, -1, $printsub);
}

# main program

print "START PROGRAM:\n";
system("/usr/bin/date");
print "/n/n";
&init_dirs;
&create_dir_arrays;
&create_nary_dir_hash;
&create_nary_dir_tree;
&print_nary_dir_tree($root_advanced);
#&split_nary_dir_tree($root_advanced);
print "END PROGRAM:\n";
system("/usr/bin/date");
[download]

Here is a small input file for testing the above script

/vobs/engine_cdma2000/code/ghdr@@/main/par_x_haloc9_s74.47.9/bld_01.06
+.00_ca25_krmt43_haloc/2
/vobs/engine_cdma2000/code/dmss/tools/mjnand@@/main/par_x_haloc9_s74.4
+7.9/bld_01.06.00_ca25_krmt43_haloc/1
/vobs/engine_cdma2000/code/dmss/tools/hostdl@@/main/par_x_haloc9_s74.4
+7.9/bld_01.06.00_ca25_krmt43_haloc/1
/vobs/engine_cdma2000/code/dmss/services/Qtv/common@@/main/par_x_haloc
+9_s74.47.9/bld_01.06.00_ca25_krmt43_haloc/1
[download]

Comment on Dividing Tree::Nary tree into subtrees Select or Download Code