grinder has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks,

I have an interesting problem that I am certain admits an elegant solution, but I can barely figure out how to brute force it, let alone come up with something nice. I am representing a tree (actually, a forest) in a hash. A sample structure looks like:

my $old = { this => { is => { the => { day => {}, hour => {}, }, this => {}, }, }, that => { that => { is => { is => {}, not => { is => { not => {}, }, }, }, }, }, those => { were => { the => { days => {}, }, }, }, };

I can traverse this recursively with the following routine:

sub traverse { my $depth = shift; my $t = shift; my $nr_kids = scalar keys %$t; if( $nr_kids > 0 ) { for my $k( sort keys %$t ) { print( ' ' x $depth, "$k\n"); traverse( $depth + 1, $t->{$k} ); } } else { # at a leaf node } } __PRODUCES__ that that is is not is not this is the day hour this those were the days

As it turns out, the trees are rather spindly. At any given node, there is usually only one kid. When this occurs, I would like to hoist the kid up into the parent. The resulting structure would be much bushier, since all nodes would have multiple kids except for the leaves. And this is were my brain melts.

I need to traverse the structure and since I would be modifying it as I go along, I have to clone a new copy as I go along. And I need to carry my context along with me, so I modify my traversal routine to look something like (most notably a depth-first traversal to process the kids before the parents):

sub traverse { my $depth = shift; my $t = shift; my $node = shift; my $nr_kids = scalar keys %$t; if( $nr_kids > 0 ) { for my $k( sort keys %$t ) { my $kid = traverse( $depth + 1, $t->{$k}, "$node $k" ); print( ' ' x $depth, "node=($node) key=($k) kid=($kid)\n") +; } } else { return $node; } }

And then I start to lose it. I'll figure this out in time, but maybe someone else can earn a few points. I know what I'm looking for. The ideal tree would look something like:

my $new = { 'this is' => { the => { day => {}, hour => {}, }, this => {}, }, 'that that is' => { is => {}, 'not is not' => {}, }, 'those were the days' => {}, }; __PRODUCES__ that that is is not is not this is the day hour this those were the days

The spaces between the original node names are merely for pretty printing. In actual fact the key names would simply be concatenated. And for those of you who like questioning questions rather than answering answers, if you can propose a different datastructure that would make the job easier, I'm all ears too.

And if you're really interested in what this is for, it's for building efficient regular expressions, where such expressions are 5-20k characters long and have lots of alternations. If I can bring similar patterns together, the engine can examine far fewer paths to determine whether a given string matches (or not).

Replies are listed 'Best First'.
Re: How to flatten a spindly tree?
by broquaint (Abbot) on Jan 06, 2004 at 15:17 UTC
    Without thinking too deeply on optimizations or better data structures this code seems to do the trick
    my $old = { this => { is => { the => { day => {}, hour => {}, }, this => {}, }, }, that => { that => { is => { is => {}, not => { is => { not => {}, }, }, }, }, }, those => { were => { the => { days => {}, }, }, }, }; use strict; sub splat { my $tree = shift; my $ret = {}; while(my($k,$v) = each %$tree) { if(keys %$v == 1) { my $onekey = (keys %$v)[0]; my $newkey = "$k $onekey"; while( keys %{$v = $v->{$onekey}} == 1 ) { $onekey = (keys %$v)[0]; $newkey .= " $onekey"; } $ret->{$newkey} = splat($v); } else { $ret->{$k} = splat($v); } } return $ret; } sub dumptree { my($t,$d) = ( $_[0], ($_[1] || 0) ); print " " x $d, $_,$/ and dumptree($t->{$_}, $d + 1) for sort keys %$t; } use Data::Dumper; my $new = splat $old; print Dumper $new; dumptree $new; __output__ $VAR1 = { 'those were the days' => {}, 'that that is' => { 'not is not' => {}, 'is' => {} }, 'this is' => { 'this' => {}, 'the' => { 'day' => {}, 'hour' => {} } } }; that that is is not is not this is the day hour this those were the days
    So that neatly flattens all nodes with only a single child into flattened string as the key.
    HTH

    _________
    broquaint

Re: How to flatten a spindly tree?
by dragonchild (Archbishop) on Jan 06, 2004 at 16:03 UTC
    Building on broquaint's code, I have
    sub traverse { my $t = shift; my $d = shift || 0; foreach my $k (sort keys %$t) { print ' ' x $depth, "$k\n"; traverse( $t->{$k}, $depth + 1 ); } } sub rebuild { my $sub; $sub = sub { my ($t) = @_; my $r = {}; while (my ($k, $v) = keys %$t) { while (keys %$v == 1) { my $k1 = (keys %$v)[0]; $k .= " $k1"; $v = $v->{$k1}; } $r->{$k} = $sub->($v); } $r; }; $sub->{$_[0]}; } traverse( $old_tree ); my $rebuilt = rebuild( $old_tree ); traverse( $rebuilt );

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.