Re^8: Data Structures

If your experience is that all your applications benefit from up-front over design in the latter stages of their life, then I would not seek to change your mind. But if this is an isolated example from all the projects where you have written extra code up front, that you perceive has saved you time later in the life if the project, then maybe you should consider the calculation (extra_time_spent * no_of_projects) - ( time_saved * those_that_have_benefited).

Maybe for you that comes out positive, which would indicate that your powers of prescience are well above the average, because every informal calculation I've seen has shown a negative.

And the size of the project has nothing to do with it. It's all to do with the nature of the data. Data formats rarely change substantially once they are defined. And wrapping native data-structures in OO-wrappers rarely generates an RIO.

Besides which, with good OO design, there should be no need for the interface to expose the internal attributes. Directly or indirectly. There should be no methods who's purpose is to only and directly change the internal state of an object.

All interactions with an object (after construction), should be in terms of actions (behaviours, messages) invoked upon instances, passing the information it requires to perform the action.

What do I mean by this?

By way of example, let's take a Time or Date or DateTime class.

The underlying data for all these is simply a (signed) integer representing the data-point plus a base date (representing epoch. (And on Earth, a timezone, but let's omit that for simplicity of discussion.).

On a traditional unix system that might be a 32-bit signed integer in seconds and Midnight, 1st Jan, 1970. On newer *nix systems, the base stays the same but the granularity of the units increases to milliseconds. On Win32 systems, the base date is Midnight, 1st Jan 1600 and the integer is 64-bits.

But for whichever system, the code required to produce a new DateTime object representing a data-point 24 hours later is the same:

DateTime tomorrow = DateTime->Now->plusDelta( DateTime::Delta->days( 1
+ ) );
[download]

Here, DateTime::Delta->days( 1 ) is a subclass constructor that returns an integer (in units compatible with the bases system) representing (in this case) 24 hours.

The plusDelta() method produces a new DateTime object that is the result of applying a DateTime::Delta() object to an existing instance of a DateTime() object.

DateTime->Now() is a constructor that returns a DateTime object representing the current time in the current epoch.

The body of the plusDelta() method is (in perl terms) just:

sub plusDelta( 
   my( $self, $delta ) = @_;

   return bless \( $self += $delta ), __PACKAGE__;
}
[download]

Note: The raw value of both the DateTime and the Delta objects are accessed directly (via overloading of the dereference operator. And the resultant (new) DateTime object is constructed directly from the result of the calculation. This works because both the DateTime object and the Delta Object are implemented as blessed scalars.

The integers those scalar hold is the only instance data required, because everything else about them can be derived. The units they represent is determined at startup from the system. The granularity, epoch (and TZ) are available via Class constants.

Assuming the availability of 64-bit math where required, the plusDelta() method does not have to change regardless of the size of the integer, the units it is measured in, the Epoch upon which it is based nor the system it is running on, because it just arithmetic.

I don't need to define an array-wrapping special collection class to form aggregates of these objects because I can store them directly in a bog standard array. I can then sort that array using the built-in sort. Compare them using normal syntax: dt[ 1 ] > dt[ 203 ].

Because each object contains only the minimum of internal state, they are very light. Aggregates take up far less space. Their representation means that most operations can be done with standard syntax making them far easier to use. Portability is simplified because most operations are done in terms of simple arithmetic operators. Performance is increased by direct access to the state (internally and externally).

Only constructors (and only those that construct new instances from different representations (eg. strings) need to perform costly validation. As any integer value represents a valid DateTime object, no further validation is required. So long as users create objects using constructors, methods need do no further validation, as all operations are arithmetic and will give consistent results regardless of platform, epoch, timezone or base. (Unless Perl or the underlying runtime suddenly forget how to do math. No getters & setters need be provided.)

Compare this simplicity with the weight & complexity of existing solutions--in Perl and other languages and libraries.

I won't expand in detail on it here, but a similar case can be made for (say) a Point3D object. Internally represented by a blessed anonymous array containing 3 numbers, the coordinates can change from 32-bit integers to 64-bit integers, to reals to complex, to rational, simply by changing the types of the numbers stored in the anon array. All the methods manipulating these objects are just doing math. Math operates correctly whatever the representation of the numbers so no validation is required after their construction.

If it is necessary for a given application to constrain operations to some subset of the 3D universe, then subclassing the Class and applying post condition validation on the parental constructors is sufficient. If the resultant object is outside of the constraining dimensions, the input must have been wrong. The subclass can then choose the appropriate coarse of action. be it taise an exception to report the problem, or coercion to correct it.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re^8: Data Structures Select or Download Code