To answer properly, I'd need more information about how you intend to use the data structure. Usually, the selection of a data structure depends on the types of operations you intend to perform on it. Since you're talking about IP addresses and ranges, I'm going to assume that you want to find the smallest IP range you've entered given an IP address. Other forms of access will suggest other data structures.
At first glance, there are several special cases you need to deal with:
Before addressing any of these items, I'm going to mention [no such wiki, sentinel values]. When I tried your code, I rearranged the node creation and put the any/any case last. The code crashed, as expected. The problem is a common one with tree/list creation code. Typically when creating lists and trees, you have the problem of treating the root node specially. Inserting an item into a list/tree is normally simple, but you have to worry about the special case of replacing the root node, which can complicate your code.
Frequently, you can simplify your code by using a sentinel value. In this case, I selected any/any as the sentinel value because in this case, it encompasses all possible values. So by using it as the root node, you'll never have to worry about replacing the root node:
#build the tree my $root = Tree::DAG_Node->new; $root->attributes({ original=>"permit ip any any", source=>['0.0.0.0/0'], destination=>['0.0.0.0/0'] }); foreach my $node (@node_list){ my $location; $root->walk_down({ callback => sub { my $this_node = shift; if (compare_nodes($node,$this_node)){ $location = $this_node->address; 1; } else{ 0; } } }); $root->address($location)->add_daughter($node) if defined $root->address($location); }
Because of case 5, you're not going to solve the problem by sorting alone. Yes, sorting may improve the situation somewhat, but you'll still have to deal with case 5. If you can put that case aside, then you can sort the list such that the largest ranges (i.e. smallest numbers after the '/') occur earlier in the insertion order.
The any/any case would be a good sentinel value, so if it's always part of your data set, you can simply insert it first. That way your insertion and query logic will be simpler than if you didn't have that case.
For example, when you rearrange your nodes such that the any/any case is last, your program dies because it tries to insert into the previously-remembered place. But since you don't have something prior to the root, you'd have to add special case logic to recognize that and replace the root. If you always sort the nodes then the any/any case would be first. I just made the any/any case the sentinel value, so I wouldn't have to complicate the code by detecting that special case.
If all your IP ranges are disjoin, identical or proper subsets, then sorting the list should solve the problem. But the instant you toss mutually intersecting sets into the mix, you have to figure out how you want to handle it. Sorting won't fix the problem, as neither node properly belongs "above" the other in the tree. So insertion will be a bit of a problem (i.e. "Where in the tree does this range go?"), and querying will also be a problem ("OK, I've found the range that my IP address is in. Do I keep looking for other matches?".).
There are several ways you can handle those disjoint sets, depending on how you access them, whether you ever delete ranges during a run, whether you ever add ranges between queries, etc. I'm just going to pick my favorite method for sake of discussion. Since there's an easy solution if I can ignore a special case, I'll try to remove the special case: In this case, I'd simply break the special cases into pieces--in this case split intersecting ranges into one part that is a proper subset of the other, and the other part that lies outside the other.
So, if your tree looks like:
0-255 (root) / \ 100-200(joe) 220-255(bob)
and you want to insert the range 150-250(sue), I'd break (sue) into pieces as needed. So during the insertion, I'd notice that the beginning of the range (150) falls in range (joe), but the end doesn't. So I'd break (sue) into 150-200(sue.1) and 200-250(sue.2), then push (sue.2) back into the list of things to insert. Then I'd insert (sue.1) into the tree getting:
0-255 (root) / \ 100-200(joe) 220-255(bob) | 150-200(sue.1)
Then I'd pull (sue.2) off the list of items to insert, and notice that the start of the range (151) is in (root), but the end of the range lies within (bob). So I'd then break (sue.2) into 151-220(sue.3) and 221-250(sue.4). I could then insert (sue.3) into (root), and then insert (sue.4) into (bob), resulting in:
0-255 (root) / | \ / | \ / | \ 100-200(joe) 151-220(sue.3) 220-255(bob) | | 150-200(sue.1) 221-250(sue.4)
Of course, this gives you the question of what to report when you query for address 224. Do you report (sue), (bob) or (sue.4)? If you want the most restrictive range (i.e., smallest) the user entered, you'd want to report (bob). So in that case, when creating the ranges, I'd add the original size of the range to the data structure so I can report the proper one:
0-255 (root;256) / | \ / | \ / | \ / | \ 100-200 151-220 220-255 (joe;100) (sue.3;100) (bob;35) | | 150-200 221-250 (sue.1;100) (sue.4;100)
Then, when reporting on 224, I could properly reply (bob) since it would have the smallest range when I traverse the tree. I'd start with (root;256), then see (bob;35), then see (sue.4;100) and notice that it's the leaf node. The smallest range I found during the traversal is (bob), so that's what I'd report.
Of course, you'd have to figure out what you want to report for address 175. Do you want to report (joe), (sue) or both? Give some time to think about what your special cases are, and how you want to report them, and then thinking about the data structure will be much easier.
If it were me, I think I'd use an array to hold the original range definitions where each entry would be [min, max, size, <other_info> ], and arrays of array references to hold the tree, where each entry would be something like: [min, max, <ref of original definition>, [ <children> ]. I'd have the size in the original range definition to simplify querying (i.e., so I don't have to compute it continually).
I hope this is helpful...
...roboticus
When your only tool is a hammer, all problems look like your thumb.
In reply to Re: How to create a Tree::DAG_Node tree with sorted hash elements
by roboticus
in thread How to create a Tree::DAG_Node tree with sorted hash elements
by benny_vroom
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |