To answer properly, I'd need more information about how you intend to use the data structure. Usually, the selection of a data structure depends on the types of operations you intend to perform on it. Since you're talking about IP addresses and ranges, I'm going to assume that you want to find the smallest IP range you've entered given an IP address. Other forms of access will suggest other data structures.

At first glance, there are several special cases you need to deal with:

Inserting data into the root of your tree,
handling sets that don't overlap,
handling identical ranges,
handling sets where one is a proper subset of the other, and
handling sets that intersect, where neither is a subset.

Before addressing any of these items, I'm going to mention [no such wiki, sentinel values]. When I tried your code, I rearranged the node creation and put the any/any case last. The code crashed, as expected. The problem is a common one with tree/list creation code. Typically when creating lists and trees, you have the problem of treating the root node specially. Inserting an item into a list/tree is normally simple, but you have to worry about the special case of replacing the root node, which can complicate your code.

Frequently, you can simplify your code by using a sentinel value. In this case, I selected any/any as the sentinel value because in this case, it encompasses all possible values. So by using it as the root node, you'll never have to worry about replacing the root node:

#build the tree
my $root = Tree::DAG_Node->new;
$root->attributes({
    original=>"permit ip any any",
    source=>['0.0.0.0/0'],
    destination=>['0.0.0.0/0']
});
foreach my $node (@node_list){
    my $location;
    $root->walk_down({
            callback => sub {
                my $this_node = shift;
                if (compare_nodes($node,$this_node)){
                    $location = $this_node->address;
                    1;
                }
                else{
                    0;
                }
            }

    });
    $root->address($location)->add_daughter($node)
      if defined $root->address($location);
}
[download]

Sorting the input

Because of case 5, you're not going to solve the problem by sorting alone. Yes, sorting may improve the situation somewhat, but you'll still have to deal with case 5. If you can put that case aside, then you can sort the list such that the largest ranges (i.e. smallest numbers after the '/') occur earlier in the insertion order.

The any/any case would be a good sentinel value, so if it's always part of your data set, you can simply insert it first. That way your insertion and query logic will be simpler than if you didn't have that case.

For example, when you rearrange your nodes such that the any/any case is last, your program dies because it tries to insert into the previously-remembered place. But since you don't have something prior to the root, you'd have to add special case logic to recognize that and replace the root. If you always sort the nodes then the any/any case would be first. I just made the any/any case the sentinel value, so I wouldn't have to complicate the code by detecting that special case.

Handling intersecting sets

If all your IP ranges are disjoin, identical or proper subsets, then sorting the list should solve the problem. But the instant you toss mutually intersecting sets into the mix, you have to figure out how you want to handle it. Sorting won't fix the problem, as neither node properly belongs "above" the other in the tree. So insertion will be a bit of a problem (i.e. "Where in the tree does this range go?"), and querying will also be a problem ("OK, I've found the range that my IP address is in. Do I keep looking for other matches?".).

There are several ways you can handle those disjoint sets, depending on how you access them, whether you ever delete ranges during a run, whether you ever add ranges between queries, etc. I'm just going to pick my favorite method for sake of discussion. Since there's an easy solution if I can ignore a special case, I'll try to remove the special case: In this case, I'd simply break the special cases into pieces--in this case split intersecting ranges into one part that is a proper subset of the other, and the other part that lies outside the other.

So, if your tree looks like:

         0-255 (root)
          /       \
  100-200(joe)  220-255(bob)
[download]

and you want to insert the range 150-250(sue), I'd break (sue) into pieces as needed. So during the insertion, I'd notice that the beginning of the range (150) falls in range (joe), but the end doesn't. So I'd break (sue) into 150-200(sue.1) and 200-250(sue.2), then push (sue.2) back into the list of things to insert. Then I'd insert (sue.1) into the tree getting:

         0-255 (root)
          /       \
  100-200(joe)  220-255(bob)
      |
  150-200(sue.1)
[download]

Then I'd pull (sue.2) off the list of items to insert, and notice that the start of the range (151) is in (root), but the end of the range lies within (bob). So I'd then break (sue.2) into 151-220(sue.3) and 221-250(sue.4). I could then insert (sue.3) into (root), and then insert (sue.4) into (bob), resulting in:

              0-255 (root)
             /      |     \
            /       |      \
           /        |       \
100-200(joe) 151-220(sue.3)  220-255(bob)
   |                            |
150-200(sue.1)               221-250(sue.4)
[download]

Of course, this gives you the question of what to report when you query for address 224. Do you report (sue), (bob) or (sue.4)? If you want the most restrictive range (i.e., smallest) the user entered, you'd want to report (bob). So in that case, when creating the ranges, I'd add the original size of the range to the data structure so I can report the proper one:

               0-255
            (root;256)
           /     |    \
          /      |     \
         /       |      \
        /        |       \
 100-200     151-220    220-255
(joe;100)  (sue.3;100)  (bob;35)
    |                      |
 150-200                221-250
(sue.1;100)           (sue.4;100)
[download]

Then, when reporting on 224, I could properly reply (bob) since it would have the smallest range when I traverse the tree. I'd start with (root;256), then see (bob;35), then see (sue.4;100) and notice that it's the leaf node. The smallest range I found during the traversal is (bob), so that's what I'd report.

Data structure representation

Of course, you'd have to figure out what you want to report for address 175. Do you want to report (joe), (sue) or both? Give some time to think about what your special cases are, and how you want to report them, and then thinking about the data structure will be much easier.

If it were me, I think I'd use an array to hold the original range definitions where each entry would be [min, max, size, <other_info> ], and arrays of array references to hold the tree, where each entry would be something like: [min, max, <ref of original definition>, [ <children> ]. I'd have the size in the original range definition to simplify querying (i.e., so I don't have to compute it continually).

I hope this is helpful...

...roboticus

When your only tool is a hammer, all problems look like your thumb.

In reply to Re: How to create a Tree::DAG_Node tree with sorted hash elements by roboticus
in thread How to create a Tree::DAG_Node tree with sorted hash elements by benny_vroom

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.