in reply to Re: Tree::Nary -- my program is creating phantom nodes
in thread Tree::Nary -- my program is creating phantom nodes

My recurse_tree function is creating "phantom nodes" (i.e. Tree:Nary nodes with an empty {data} field ) both when I take in the subtree node by reference, and when I take it in by value, i.e.:

# By reference my $root_address = shift; my $root = $$root_address;

AND
# By value my $root = shift;

both lead to creating of phantom nodes in my N-ary tree. These tend to be close to the top of the tree, but print-traversal of my Tree::Nary tree shows they are not the immediate children of my Tree's root. Also debug statements that use the Tree::Nary->n_nodes function show these phantom nodes are being made in my recurse_tree() function and nowhere else. Neither recurse_tree nor its helper, find_subtree(), use the Tree::Nary append or prepend functions.

I am on vacation but as soon as I can I will post a copy of my recurse_tree() function here.

Thank You,
Amphiaraus

Replies are listed 'Best First'.
Re^3: Tree::Nary -- my program is creating phantom nodes
by tilly (Archbishop) on Dec 22, 2008 at 08:44 UTC
    Please read this article and try to understand the difference between a shallow copy and a deep copy. Once you understand that you will understand why it didn't make a difference when you tried to pass the structure by value or reference. But it would make a difference if you copied it with Storable's dclone.

    I obviously couldn't tell you why the code changed the tree. I've already indicated that I wouldn't worry about it too much, I would rewrite the code in question.

      Thanks Tilly -- the $child_type variable works perfectly. My recurse_tree() function is no longer making "phantom nodes" in my N-ary tree.

      Due to time pressures I will need to retain Tree::Nary. I am finding that creating and accessing even large N-ary trees is being done in a reasonable time, despite the inefficiencies you pointed out in Tree::Nary.

      Thank You,
      Amphiaraus
      The code of my recurse_tree() function is shown below, preceded by my "use" staments for Storable.

      My program is attempting to create an N-ary tree for the following Clearcase elements:
      cat integration_branch_directories.txt
      /vobs/engine_cdma2000/code/ghdr@@/main/par_x_raspberry_r90.2/bld_23.00_ca25_fpgh48_coral_build/1
      cat integration_branch_files.txt
      /vobs/synergy_core_apps/code/brew_proxy/src/app_brew_bgpxy.c@@/main/par_brew_framework_sprint/bld_23.00_ca25_fpgh48_coral_build/1

      At present I am getting this error message when recurse_tree() begins executing:
      cat SMALL4.err Can't use string ("4321 Tree::Nary") as a HASH ref while "strict refs" in use at /vobs/cdma_common_cmtools/common/tools/copy_merge line 2237.

      My Storable "use" statements:
      # Used to lock the N-ary tree against inappropriate changes (once it i +s completed) use Storable; use Storable qw(store retrieve freeze thaw dclone);
      My recurse_tree() function -- it receives the root of an N-ary tree, or a subtree of the N-ary tree (pass by reference) as input:
      ###################################################################### +######### # NAME: recurse_tree # DESCRIPTION: 1) Use recursion and the Tree::Nary module to define @ +small_arrays of # files and directories, as well as a @top_level_dir +array # 2) Inputs to this function must be Dir Nodes, they are + the root # of an N-ary tree, or root of some subtree of the N- +ary tree. These # Nodes are Tree::Nary nodes. # 3) If the Dir Node repesents a dir from files.txt or d +irs.txt that # has subdirecties, or that contains files, these ele +ments will be # in a linked list to which the Dir Node will have a +{children} # pointer. # 4) Such {children} lists will be all files, all direct +ories, or a mix # of files and directories with any files appearing f +irst (i.e. leftmost) # in the linked list. A Dir Node could also have an e +mpty {children} # pointer. # 5) Nodes in this linked list are File and/or Dir Nodes +. They are connected by # {next} pointers, the next pointer points to the Nod +e to the left of the # current Node. Only the leftmost Node has a connecti +on to the Parent Node # (it is pointed to the parent's {children} pointer. # 6) This function will test the input Dir Node to see i +f it is a base case or # or a recursive case, and of what type. # 7) If the input Dir Node has a {children} linked list, + the function will traverse # the list and make similar tests on each Node in the + list. # 8) The rightmost Node in a {children} linked list will + have a {next} pointer # that points to NULL, and is used to terminate the t +raversal of the linked list. ###################################################################### +######### sub recurse_tree { # Address of the Root of an N-ary tree, or root of some subtree # N-ary tree given this function as input. It MUST be a directory n +ode. # This node is passed by REFERENCE so changes occurring in this fun +ction # will change global values. my $node_address = shift; # Create deep clone of subtree my $node_tmp = Storable::dclone($node_address); # Freeze this deep clone to protect it against unwanted changes # which may occur in function recurse_tree() my $node = Storable::freeze($node_tmp); ### my $node = $$node_address; my $debug = TRUE; my $tree_size = Tree::Nary->n_nodes($root_advanced, $Tree::Nary::TR +AVERSE_ALL); print "ENTERING RECURSE_TREE: The tree has $tree_size nodes\n"; if ($debug) { print "Entering recurse_tree\n"; } # Defines maximum size of a directory @small_array use constant MAX_SIZE_DIR_GROUP => 5; # In addition to pointers to other nodes, an N-ary tree node has a +{data} pointer, which is # a pointer to the user-defined variable or data structure which ho +lds the node's data. my $node_data; # For this script, the {data} pointer is for a hash containing vari +ous information # on the Dir or File that the node represents my %data_hash; if (defined $node) { # Get address of node data hash $node_data = $node->{data}; # Convert this address into a physical hash %data_hash = %$node_data } else { print "ERROR in recurse_tree, Node was undefined\n"; exit 1; } my $call_recurse_tree = sub { my $node = shift; &recurse_tree(\$node); }; # ERROR: Input node cannot be a File Node, either the N-ary tree is # malformed or the script has malfunctioned if($data_hash{type} == FILE) { print "ERROR: In recurse_tree, Node $data_hash{file_name} is a F +ile\n"; exit 1; } elsif ( !defined($data_hash{file_name}) && !defined($data_hash{di +r_name}) ){ print "ERROR: Tree Nary error, this node has no data\n"; exit 1; } else { # Input node is type == DIR, and name of element is known, as ex +pected # RECURSIVE CASE #1: Node needing merge has {dir_merge_count} ov +er 20, # so it needs to be placed in @top_level_dirs (which will be # processed ahead of @small_arrays of dirs) Each of these @small +_arrays # will be size 20 or less. Top level dir's {needs_merge} is flip +ped to FALSE. # Note: {children} pointer of the input Dir Node must == DIR, be +cause if # it was FILE its {dir_merge_count} could be no more than 3. # Then, traverse the {children} list of this Node. if ( ($data_hash{needs_merge} == TRUE) && ($data_hash{dir_merge_ +count} > MAX_SIZE_DIR_GROUP) ){ if ($debug){ print "\nENTERING RECURSIVE CASE #1: The tree has $tree_si +ze nodes\n"; print "Input Dir Node has name $data_hash{dir_name}\n"; print "Input Dir Node has needs_merge $data_hash{needs_mer +ge}\n"; print "Input Dir Node has dir_merge_count $data_hash{dir_m +erge_count}\n"; } # Flip the Dir Node's {needs_merge} value to false $node->{data}->{needs_merge} = FALSE; # Add the input Dir Node to @top_level_dirs push(@top_level_dirs, $data_hash{dir_vep}); # PROCESS CHILDREN LIST: recurse_tree will be called on each +child of the input Dir Node Tree::Nary->children_foreach($node, $Tree::Nary::TRAVERSE_ALL +, $call_recurse_tree); $tree_size = Tree::Nary->n_nodes($root_advanced, $Tree::Nary: +:TRAVERSE_ALL); print "LEAVING RECURSIVE CASE #1: The tree has $tree_size nod +es\n"; } # RECURSIVE CASE #2: A Dir Node that is NOT a merge candidate, b +ut that # has subdirectories, must still be processed by recurse_tree(), + otherwise there # would be no way to process elements that are beneath such a Di +r Node. # Recursion will continue for such a directory whether or not it +'s child subdirectories # require a merge. elsif ( ($data_hash{needs_merge} == FALSE) && (defined($node->{c +hildren}->{data}->{type})) && ($node->{children}->{data}->{type} == D +IR) ){ if ($debug){ print "\nENTERING RECURSIVE CASE #2: The tree has $tree_si +ze nodes\n"; print "Input Dir Node has name $data_hash{dir_name}\n"; print "Input Dir Node has needs_merge $data_hash{needs_mer +ge}\n"; print "Type of Dir Node children list is $node->{children} +->{data}->{type}\n"; } # PROCESS CHILDREN LIST: recurse_tree will be called on each +child of $node Tree::Nary->children_foreach($node, $Tree::Nary::TRAVERSE_ALL +, $call_recurse_tree); $tree_size = Tree::Nary->n_nodes($root_advanced, $Tree::Nary: +:TRAVERSE_ALL); print "LEAVING RECURSIVE CASE #2: The tree has $tree_size nod +es\n"; } # BASE CASE #1: Node needing merge has 0 < {dir_merge_count} <= +20, and has # a {children} list consisting of directories. # This Dir node is a merge candidate and is the parent of a subt +ree of # directories small enough to form a "safe subtree" that can be +loaded to a # @small_array of directories. Any files encountered in this sub +tree # will be loaded into one or more file @small_arrays of files by + find_subtree(). # find_subtree() will also put ALL merge_candidate dirs in this +subtree into a single # @small_array for dirs. Note: since the input Dir Node is a mer +ge_candidate it # will have a {dir_merge_count} of at least 1. elsif ( ($data_hash{needs_merge} == TRUE) && ($data_hash{dir_mer +ge_count} <= MAX_SIZE_DIR_GROUP) && (defined($node->{children}->{data +}->{type})) && ($node->{children}->{data}->{type} == DIR) ){ if ($debug){ print "\nENTERING BASE CASE #1: The tree has $tree_size no +des\n"; print "Input Dir Node has name $data_hash{dir_name}\n"; print "Input Dir Node has needs_merge $data_hash{needs_mer +ge}\n"; print "Input Dir Node has dir_merge_count $data_hash{dir_m +erge_count}\n"; print "Type of Dir Node children list is $node->{children} +->{data}->{type}\n"; } &find_subtree(\$node); $tree_size = Tree::Nary->n_nodes($root_advanced, $Tree::Nary: +:TRAVERSE_ALL); print "LEAVING BASE CASE #1: The tree has $tree_size nodes\n" +; } # BASE CASE #2: Process a Dir Node, whether or not it is a merge + candidate, that has files as children. # The find_subtree function will correctly deal with the childre +n list, whether it is 100% files # or a mixture of files and directories. If the input Dir Node i +s a merge candidate, it will be # put into a directory @small_array of size 1. elsif ( (defined($node->{children}->{data}->{type})) && ($node-> +{children}->{data}->{type} == FILE) ){ if ($debug){ print "\nENTERING BASE CASE #2: The tree has $tree_size no +des\n"; print "Input Dir Node has name $data_hash{dir_name}\n"; print "Type of Dir Node children list is $node->{children} +->{data}->{type}\n"; } &find_subtree(\$node); $tree_size = Tree::Nary->n_nodes($root_advanced, $Tree::Nary: +:TRAVERSE_ALL); print "LEAVING BASE CASE #2: The tree has $tree_size nodes\n" } # BASE CASE #3: Process a Dir Node that is a merge candidate and + that has no children elsif ( ($data_hash{needs_merge} == TRUE) && !(defined($node->{c +hildren}->{data}->{type})) ){ if ($debug){ print "\nENTERING BASE CASE #3: The tree has $tree_size no +des\n"; print "Input Dir Node has name $data_hash{dir_name}\n"; print "This Dir Node has no children\n"; } &no_subtree(\$node); $tree_size = Tree::Nary->n_nodes($root_advanced, $Tree::Nary: +:TRAVERSE_ALL); print "LEAVING BASE CASE #3: The tree has $tree_size nodes\n" } else { # DEFAULT ERROR CASE print "\nERROR: In DEFAULT ERROR CASE, this Dir Node does not + fit any existing base or recursive case\n"; print "Input Dir Node has name $data_hash{dir_name}\n"; print "Input Dir Node has needs_merge $data_hash{needs_merge} +\n"; print "Input Dir Node has dir_merge_count $data_hash{dir_merg +e_count}\n"; print "Type of Dir Node children list is $node->{children}->{ +data}->{type}\n"; exit 1; } } # end type == DIR else }
      I would appreciate any help you might give.

      Amphiaraus
        Empty nodes are being created from constructs like $node->{children}->{data}->{type} creates the intermediate data structures in that chain, and therefore an empty node. You can fix this by factoring out those lookups:
        my $children = $node->{children}; my $child_type = undef; if ($node->{children} and $node->{children}->{data}) { $child_type = $node->{children}->{data}->{type}; }
        then just use $child_type.

        In your attempts to fix the code you have both a dclone and a freeze. The freeze will result in a string that cannot be accessed properly which will break the rest of the code. The dclone itself is enough. But note that dclone has a considerable performance penalty. So fixing the bug that causes the blank nodes is better than using it.

        On a general note, I reiterate that Tree::Nary is a bad idea. What you are basically doing is trying to code Perl as if it was C. But Perl is not. C structs are efficient and have essentially no overhead. In Perl an anonymous hash ref takes 92 bytes. Before you put anything in it. (You can use Devel::Size to get a sense for this.) Complex structures built out of anonymous hashes come with a considerable memory and performance overhead. Furthermore virtually anything you can do with them can be done with far simpler native Perl structures, in much less code.

        (Speaking of less code, your style is rather..verbose. I'd highly recommend finding ways to make your comments shorter and your code more descriptive. Code Complete has some excellent advice to help you with that.)