comment on

This is an "OO Design 101" question. I will cast it in terms of an example using trees, but I could have just as well used linked lists, graphs, etc.

Suppose that I want to design a class Tree to represent (you guessed it) trees, with the additional requirement that the tree must not contain "duplicate nodes" (the reason for the scare-quotes here will become apparent soon). Suppose also that the interface for this class includes a method add_node, whose job is to add a new node to the set N of the tree's nodes.

sub add_node {
  my $self = shift;
  my $node = shift;
  return if is_duplicate( $node ); # do nothing for dups
  # proceed to add node to the set of nodes for this tree
}
[download]

But how can is_duplicate() tell whether a new node is the duplicate of an already added node? Strictly speaking, it can't without some help from the client code, because each application has an idiosyncratic definition of equality.

For example, suppose the tree class is being used to represent processes running in a Unix system, and the tree class has to decide whether two Process objects $p0 and $p1 are the same. It won't do to compare the stringified versions of these objects (i.e. "$p0" eq "$p1") because the objects could be different by this criterion and yet be logically equivalent (e.g. though they are different objects all the fields that together uniquely characterize these objects as Unix processes, such as the start time, PID, PPID, etc., have identically matching values).

OK, to recap, we want to design this Tree class, and in particular, its add_node method to prevent adding "duplicate nodes."

One draconian possibility is this: require client classes to define an id method, returning an "application-unique" string identifying the object (e.g. the PID for the process). Then add_node would could be written like this:

sub add_node {
  my $self = shift;
  my $node = shift;
  return $self->{ N }{ $node->id } ||= $node;
}
[download]

The obvious shortcoming of this approach is the fact that it requires client classes to modify their API. A very unreasonable demand, especially considering that there is no consensus on what the name of this method ought to be. A different data structure class, say Digraph, written by someone else, may use ID or hash_code for the same purpose.

Another possibility would to define a wrapper class called Node, whose interface would consist (minimally) of one constructor and the methods id and object (or payload or content). The constructor for Node could include slots to specify both the payload object and some unique ID. It would be up to the client code to ensure that two objects receive the same ID if and only if they represent the same entity (e.g. the same Unix process).

With this scheme, add_node would be defined as above, and client code would invoke it like this:

$tree->add_node( Node->new( $some_random_object, $id ) );
[download]

A third alternative would be to add one configuration parameter to the Tree class. This optional parameter would be a code ref that the Tree class would apply to client objects to determine their unique ID. It would default to plain stringification:

sub { my $node = shift; return "$node" }
[download]

Then add_node would be defined like this:

sub add_node {
  my $self = shift;
  my $node = shift;
  my $id = get_id( $node );
  return $self->{ N }{ $id } ||= $node;
}

sub get_id {
  my $self = shift;
  my $node = shift;
  my $id_fxn = $self->id_fxn;
  return $id_fxn( $node );
}
[download]

Or, for greater flexibility, maybe the last two approaches could be combined:

sub get_id {
  my $self = shift;
  my $node = shift;  # $node->isa( 'Node' )
  return
    ref $node && $node->can( 'id' ) ? $node->id
                                    : $self->id_fxn->( $node );
}
[download]

...though this may be a case of misguided featurism.

I'd be interested in reading your opinions on these approaches, and/or suggestions for solving this design problem.

P.S. BTW, I realize that some of the solutions above could be implemented by overloading "", but, even though I understand the mechanics of overloading, I've been bitten enough by overloading-related bugs that I must conclude I am not yet at the level that I can use this tool safely in production code.

the lowliest monk

In reply to OO Identity Crisis by tlm

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.