Re^2: Building hash tree from a data file

Below is the code I have tried..

I am sure it is not great, but I have given a try....

I am able to go to 2 levels but confused as to how to make this more generic. Two issues I have are:-

1)The way I build the data structure doesn't seem to be right, although it produces the result as it is not generic

$linked_dsc{ $main_var }{ $contitent_hash->{ $link_word } }{ $ret_dsc->{ $key1 } } = undef;

2) I have to call the subroutine again for each element in the array @new_actions. I don't how to proceed with that.

Please provide some hints/ tips to move further..

#! /usr/bin/perl

use strict;
use Data::Dumper;

my ( %linked_dsc, $main_var, @actions );

my $maincnt = 0;
my $found = 0;
my ($t1, $t2);
my $contitent_hash = { };

open (FILE, "./Data.txt") or die "Can't find file";

while (<FILE>) {

# From start of 1st 'Main' to start of 2nd 'Main', read the lines and 
# store in appropriate variables
# $contitent_hash has abbreviation as key and full name as value
# @actions has search patterns to look for in the next pass
# for example: 'NA' => 'North America' is in the hash variable and 
#                    'Name = NA' is in the array
    if ( /^Main$/ ) {
        $maincnt++;
    }
    
    if ( $maincnt == 1 ) {
        if ( /^Main/../^End/) {
          if ( /^\s*(.*)\s=\s(.*)/ ) {
              if ( $1 eq "Name") {
                    $main_var = $2;
                    #$linked_dsc{ $main_var } = undef;
                }
          }
       }
        elsif ( /^Sub/../^End/) {    
          if ( /^\s*(.*)\s=\s(.*)/ ) {
              if ( $1 eq "Action") {
                    ($t1, $t2) = split ( /: /, $2 );
                    push @actions, ('Name = '. $t2 );
                    $contitent_hash->{ $t2 } = undef;
                }
                elsif ( $1 eq "Text") {
                    $contitent_hash->{ $t2 } = $2;
                }
          }    
        }
    }
    last if ( $maincnt > 1);
}

# Self checking...
print "---------------------\n";
print "@actions\n";
print "---------------------\n";
foreach my $key ( keys %{ $contitent_hash } ) {
    print "$key = $contitent_hash->{ $key }\n";
    #$linked_dsc{ $main_var }{ $contitent_hash->{ $key } } = undef;
}
print "---------------------\n";
# End of self check

foreach  ( @actions ) {
    
    print "SEARCH STRING: $_\n";
    my (undef, $link_word) = split / = /;
    my ($ret_dsc, $ret_arr ) = process_next( $_ );
    foreach my $key1 ( keys %{ $ret_dsc } ) {
        # Checking...
        print "$key1 = $ret_dsc->{ $key1 }\n";
        $linked_dsc{ $main_var }{ $contitent_hash->{ $link_word } }{ $
+ret_dsc->{ $key1 } } = undef;
    }
    #Checking...
    #print Dumper ( $ret_dsc );
    #print Dumper ( $ret_arr );
}

print Dumper ( \%linked_dsc );


# Subroutine to search for the searchstring in the file, start reading
+ 
# the subsequent lines and store in appropriate variables until the 
# next 'Main' is encountered.
sub process_next {

my $searchstr = shift;

my $new_href= { };
my ( @new_actions, $left_val, $right_val);

seek ( FILE, 0, 0 );

$found = 0;

while ( <FILE> ) {
    
    if ( ! $found ) {
        if ( ! /$searchstr/ ) {
            next;
        }
        else {
            $found = 1;
        }                 
    }
    
    if ( $found ) {
        if ( /^Sub/../^End/) {    
            if ( /^\s*(.*)\s=\s(.*)/ ) {
                if ( $1 eq "Action") {
                        ($left_val, $right_val ) = split ( /: /, $2 );
                        if ( defined $right_val ) { 
                            $new_href->{ $right_val } = undef;  
                            push @new_actions, ('Name = '. $right_val 
+);
                        };
                        #else { print "Not defined\n" };
                    }
                    elsif ( $1 eq "Text") {
                        if (defined $right_val) { 
                            $new_href->{ $right_val } = $2;     
                        }
                        else { 
                            $new_href->{ $2 } = $2 
                        };
                    }
            }    
        }
        last if ( /Main/ );
    }
}

# Checking...
# print "new actions : @new_actions\n";
return $new_href, \@new_actions;
}
[download]

Output looks as follows:-

---------------------
Name = NA Name = EU
---------------------
NA = North America
EU = Europe
---------------------
SEARCH STRING: Name = NA
US = United States
CA = Canada
MX = Mexico
SEARCH STRING: Name = EU
Italy = Italy
France = France
$VAR1 = {
          'World' => {
                       'Europe' => {
                                     'France' => undef,
                                     'Italy' => undef
                                   },
                       'North America' => {
                                            'Mexico' => undef,
                                            'United States' => undef,
                                            'Canada' => undef
                                          }
                     }
        };
[download]

Comment on Re^2: Building hash tree from a data file Select or Download Code

Replies are listed 'Best First'.
Re^3: Building hash tree from a data file by graff (Chancellor) on Jul 12, 2006 at 05:11 UTC
This is a good start, but you are right about the fact that it does not extend very easily past the first couple layers of structure in your data. The data is logically a hierarchy, but it is not stored in a proper hierarchic structure. If we view each stretch that starts with "Main" and ends with "---" as a "block", the first, second and third blocks form a nesting relation, but then the fourth block goes back up a level, to be a sibling of the second block. So you need to build a structure as you read in the data, but to do this, you need to be able to jump around within the structure that you are building, based on the key strings provided in each block. The following approach reads the data one block at a time (as suggested in my other reply), and uses a hash ref to jump around within your main "linked_dsc" hash as each block is read in. Two other hashes (link_parent and link_text) are used to navigate over the main structure, and keep track of the key relations between abbreviations and full strings. One slight complication in your data is that the "Name" value in the first block is used as a printable label, whereas the "Name" values of the other blocks are just linkage keys, which you don't want to print. So the handling of the first block is a special case, and in the other blocks, we have to replace the linkage key (the abbreviation) with its corresponding full string, after we've used the key to find the right position in the hash structure. (I also made a couple minor edits to the sample data in the OP, so I have included that below as __DATA__.) Read more... (2 kB) When I run that, I get this output, which I think is pretty close to what you want (ignoring the order of hash keys, which is random): `$VAR1 = { 'Countries' => { 'Europe' => { 'France' => undef, 'Italy' => undef }, 'North America' => { 'Canada' => undef, 'United States' => { 'Atlanta' + => undef, 'Boston' +=> undef }, 'Mexico' => undef } } };` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Building hash tree from a data file
by graff (Chancellor) on Jul 12, 2006 at 05:11 UTC

The data is logically a hierarchy, but it is not stored in a proper hierarchic structure. If we view each stretch that starts with "Main" and ends with "---" as a "block", the first, second and third blocks form a nesting relation, but then the fourth block goes back up a level, to be a sibling of the second block.

So you need to build a structure as you read in the data, but to do this, you need to be able to jump around within the structure that you are building, based on the key strings provided in each block.

The following approach reads the data one block at a time (as suggested in my other reply), and uses a hash ref to jump around within your main "linked_dsc" hash as each block is read in. Two other hashes (link_parent and link_text) are used to navigate over the main structure, and keep track of the key relations between abbreviations and full strings.

One slight complication in your data is that the "Name" value in the first block is used as a printable label, whereas the "Name" values of the other blocks are just linkage keys, which you don't want to print. So the handling of the first block is a special case, and in the other blocks, we have to replace the linkage key (the abbreviation) with its corresponding full string, after we've used the key to find the right position in the hash structure.

(I also made a couple minor edits to the sample data in the OP, so I have included that below as __DATA__.)