in reply to Re^6: Data Structures
in thread Data Structures

Back when I was maintaining a C++ app, there was a struct written as a class, and everything in the struct had accessors. They were all completely simple and transparent such that you could have done away with the lot of them and accessed everything through simple public members. Nevertheless, the accessors were used everywhere. This struct was the heart of the application, so that was many many method calls. It seemed like a waste.

Then one day the requirements changed, and in some circumstances I'd have to recompute some things when others were changed. I tell you I was mighty grateful that all access went through accessors. I could go to where the class was implemented and make my changes there and nowhere else, and everything worked.

I shudder to think of the search-and-replace nightmare that would have been my workday had the elements of this struct been accessed directly throughout the program I worked on. As I said, this was the one class that was used everywhere.

I think most programs grow larger as they age. The "big program" techniques that one uses today might not be necessary today, but they often pay off in the long run. I tend to write software as if it is a little bigger than it actually is. I use strict and warnings in a two-line script. I use objects (sometimes) when a simpler data structure will do, just to collect the code related to that data structure in one place. I use set/get methods to access stuff in objects, even if they're not much more than dumb hashes (though I rarely, if ever, write those accessors myself).

I think the time lost writing big for a small program that never grows is a reasonable price for the times that I write big for a program that (as usual) does grow.

Replies are listed 'Best First'.
Re^8: Data Structures (optimized accessors and Perl 6)
by tye (Sage) on May 08, 2008 at 06:27 UTC

    That lead me to think of something that might be worthwhile. It'd be cool to have compile-time optimization of simple accessors such that you could redeclare the accessors in the class definition and the simple accessors are no longer optimized and no users of those accessors need to be updated.

    So you declare class Foo objects have public members of .x and $foo.x is compiled into very efficient accessing of that member. Down the road, you decide to write an explicit and more complicated accessor for .x. When code using this class gets compiled with "use Foo;" loading the new version of the class, then $y= $foo.x; gets compiled into $y= $foo.x(); while $foo.x= $y; gets compiled into $foo.x( $y );.

    Of course, this probably won't happen in Perl 6 because the general case requires \$foo.x to be compiled into something that ties a scalar value such that a fetch of that value calls $foo.x() while a store to that value calls $foo.x( ... ), and history shows that this inefficient general case will likely be used for the simple cases as well. But perhaps Perl 6's design goal of being more easily optimized might change that.

    Or we could just restrict such to read-only attributes. Then you could have code doing $y= $foo.x; that doesn't need to be updated (and is as efficient as it can be) and if you have any code that does $foo.x= $y;, then (once you've declared the member variable as requiring an accessor) you'd get a compile-time error telling you exactly what code needs to be changed. You'd need only re-compile ("perl -c") all code in your repository to verify that you don't have anything left to fix.

    There could even be a distinction between 1) ".x is private" meaning using $foo.x outside of a method calls the accessor while $self.x inside of a method does a direct access; vs. 2) ".x() is the accessor for ._x", which would require any place in your classes that modifies .x directly to be upgraded to either $foo.x( ... ); or $foo._x= ...;.

    Anyway, I don't recall reading of such an idea in Perl 6 design documents, but that was a long time ago so I could easily have forgotten or it could have been introduced since then.

    - tye        

      That idea came up in design discussions, and I believe we left it as a potential but not required optimization.

Re^8: Data Structures
by BrowserUk (Patriarch) on May 08, 2008 at 15:00 UTC

    If your experience is that all your applications benefit from up-front over design in the latter stages of their life, then I would not seek to change your mind. But if this is an isolated example from all the projects where you have written extra code up front, that you perceive has saved you time later in the life if the project, then maybe you should consider the calculation (extra_time_spent * no_of_projects) - ( time_saved * those_that_have_benefited).

    Maybe for you that comes out positive, which would indicate that your powers of prescience are well above the average, because every informal calculation I've seen has shown a negative.

    And the size of the project has nothing to do with it. It's all to do with the nature of the data. Data formats rarely change substantially once they are defined. And wrapping native data-structures in OO-wrappers rarely generates an RIO.

    Besides which, with good OO design, there should be no need for the interface to expose the internal attributes. Directly or indirectly. There should be no methods who's purpose is to only and directly change the internal state of an object.

    All interactions with an object (after construction), should be in terms of actions (behaviours, messages) invoked upon instances, passing the information it requires to perform the action.

    What do I mean by this?

    By way of example, let's take a Time or Date or DateTime class.

    The underlying data for all these is simply a (signed) integer representing the data-point plus a base date (representing epoch. (And on Earth, a timezone, but let's omit that for simplicity of discussion.).

    On a traditional unix system that might be a 32-bit signed integer in seconds and Midnight, 1st Jan, 1970. On newer *nix systems, the base stays the same but the granularity of the units increases to milliseconds. On Win32 systems, the base date is Midnight, 1st Jan 1600 and the integer is 64-bits.

    But for whichever system, the code required to produce a new DateTime object representing a data-point 24 hours later is the same:

    DateTime tomorrow = DateTime->Now->plusDelta( DateTime::Delta->days( 1 + ) );

    Here, DateTime::Delta->days( 1 ) is a subclass constructor that returns an integer (in units compatible with the bases system) representing (in this case) 24 hours.

    The plusDelta() method produces a new DateTime object that is the result of applying a DateTime::Delta() object to an existing instance of a DateTime() object.

    DateTime->Now() is a constructor that returns a DateTime object representing the current time in the current epoch.

    The body of the plusDelta() method is (in perl terms) just:

    sub plusDelta( my( $self, $delta ) = @_; return bless \( $self += $delta ), __PACKAGE__; }

    Note: The raw value of both the DateTime and the Delta objects are accessed directly (via overloading of the dereference operator. And the resultant (new) DateTime object is constructed directly from the result of the calculation. This works because both the DateTime object and the Delta Object are implemented as blessed scalars.

    The integers those scalar hold is the only instance data required, because everything else about them can be derived. The units they represent is determined at startup from the system. The granularity, epoch (and TZ) are available via Class constants.

    Assuming the availability of 64-bit math where required, the plusDelta() method does not have to change regardless of the size of the integer, the units it is measured in, the Epoch upon which it is based nor the system it is running on, because it just arithmetic.

    I don't need to define an array-wrapping special collection class to form aggregates of these objects because I can store them directly in a bog standard array. I can then sort that array using the built-in sort. Compare them using normal syntax: dt[ 1 ] > dt[ 203 ].

    Because each object contains only the minimum of internal state, they are very light. Aggregates take up far less space. Their representation means that most operations can be done with standard syntax making them far easier to use. Portability is simplified because most operations are done in terms of simple arithmetic operators. Performance is increased by direct access to the state (internally and externally).

    Only constructors (and only those that construct new instances from different representations (eg. strings) need to perform costly validation. As any integer value represents a valid DateTime object, no further validation is required. So long as users create objects using constructors, methods need do no further validation, as all operations are arithmetic and will give consistent results regardless of platform, epoch, timezone or base. (Unless Perl or the underlying runtime suddenly forget how to do math. No getters & setters need be provided.)

    Compare this simplicity with the weight & complexity of existing solutions--in Perl and other languages and libraries.

    I won't expand in detail on it here, but a similar case can be made for (say) a Point3D object. Internally represented by a blessed anonymous array containing 3 numbers, the coordinates can change from 32-bit integers to 64-bit integers, to reals to complex, to rational, simply by changing the types of the numbers stored in the anon array. All the methods manipulating these objects are just doing math. Math operates correctly whatever the representation of the numbers so no validation is required after their construction.

    If it is necessary for a given application to constrain operations to some subset of the 3D universe, then subclassing the Class and applying post condition validation on the parental constructors is sufficient. If the resultant object is outside of the constraining dimensions, the input must have been wrong. The subclass can then choose the appropriate coarse of action. be it taise an exception to report the problem, or coercion to correct it.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.