grinder has asked for the wisdom of the Perl Monks concerning the following question:
Esteemed monks,
I have an interesting problem that I am certain admits an elegant solution, but I can barely figure out how to brute force it, let alone come up with something nice. I am representing a tree (actually, a forest) in a hash. A sample structure looks like:
my $old = { this => { is => { the => { day => {}, hour => {}, }, this => {}, }, }, that => { that => { is => { is => {}, not => { is => { not => {}, }, }, }, }, }, those => { were => { the => { days => {}, }, }, }, };
I can traverse this recursively with the following routine:
sub traverse { my $depth = shift; my $t = shift; my $nr_kids = scalar keys %$t; if( $nr_kids > 0 ) { for my $k( sort keys %$t ) { print( ' ' x $depth, "$k\n"); traverse( $depth + 1, $t->{$k} ); } } else { # at a leaf node } } __PRODUCES__ that that is is not is not this is the day hour this those were the days
As it turns out, the trees are rather spindly. At any given node, there is usually only one kid. When this occurs, I would like to hoist the kid up into the parent. The resulting structure would be much bushier, since all nodes would have multiple kids except for the leaves. And this is were my brain melts.
I need to traverse the structure and since I would be modifying it as I go along, I have to clone a new copy as I go along. And I need to carry my context along with me, so I modify my traversal routine to look something like (most notably a depth-first traversal to process the kids before the parents):
sub traverse { my $depth = shift; my $t = shift; my $node = shift; my $nr_kids = scalar keys %$t; if( $nr_kids > 0 ) { for my $k( sort keys %$t ) { my $kid = traverse( $depth + 1, $t->{$k}, "$node $k" ); print( ' ' x $depth, "node=($node) key=($k) kid=($kid)\n") +; } } else { return $node; } }
And then I start to lose it. I'll figure this out in time, but maybe someone else can earn a few points. I know what I'm looking for. The ideal tree would look something like:
my $new = { 'this is' => { the => { day => {}, hour => {}, }, this => {}, }, 'that that is' => { is => {}, 'not is not' => {}, }, 'those were the days' => {}, }; __PRODUCES__ that that is is not is not this is the day hour this those were the days
The spaces between the original node names are merely for pretty printing. In actual fact the key names would simply be concatenated. And for those of you who like questioning questions rather than answering answers, if you can propose a different datastructure that would make the job easier, I'm all ears too.
And if you're really interested in what this is for, it's for building efficient regular expressions, where such expressions are 5-20k characters long and have lots of alternations. If I can bring similar patterns together, the engine can examine far fewer paths to determine whether a given string matches (or not).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How to flatten a spindly tree?
by broquaint (Abbot) on Jan 06, 2004 at 15:17 UTC | |
|
Re: How to flatten a spindly tree?
by dragonchild (Archbishop) on Jan 06, 2004 at 16:03 UTC |