I've been following Using the strict module in object oriented programming and this reply Re: Using the strict module in object oriented programming suggested using arrayref based objects. I started up a long drawn out reply in there before deciding it was sufficiently off topic and merited another thread. So here we are.

Way back in the olden days of my OO perl programming, I used array based objects. They're faster (I recall benchmarking an arrayref vs hashref based object a few years back and finding that the arrayref was 33% faster), or at least they were back when I looked into it.

The standard headache with array based objects is that the "attributes" are less useful than hashes, as such:

$obj->[0] = 'foo'; $obj->[1] = 'bar'; $obj->[2] = 'baz'; vs $obj->{'foo'} = 'foo'; $obj->{'bar'} = 'bar'; $obj->{'baz'} = 'baz';

The hashref is much prettier. A standard clever approach is to define some sort of constant or variable with a prettier lookup value, along these lines:

package Class::Of::Obj; my $foo_idx = 0; my $bar_idx = 1; my $baz_idx = 2; $obj->[$foo_idx] = 'foo'; $obj->[$bar_idx] = 'bar'; $obj->[$baz_idx] = 'baz';

Much prettier than needing to remember all of those indexes all over the place. Of course, if you want to access the arrayref directly from outside of your class, you'd need to export out your constants (or scalars or whatever) to anything using your module that wants to directly access. That's messy, at best. A very good solution here is to use accessors and mutators from outside the class (you should be doing that anyway!) so the user doesn't need to worry about your constant indexes. You just use those internally yourself.

And this works all fine and dandy, fairly efficiently, fairly easily, and without your user needing to care (assuming you've wrappered everything in accessors, of course). But there's a gotcha you're bound to run into sooner or later - subclasses. Even worse, multiple subclasses.

Observe some scenarios:

package Super::Class; my $foo_idx = 0; my $bar_idx = 1; my $baz_idx = 2; package Sub::Class; # err...now what? How do I start using the next open index? # Everything's hardwired up there and the Super::Class # isn't telling me. I'm stuck.

You can look in the Super::Class and determine its last index and hardwire your values to start after that, but then your code will break if the Super::Class adds in a new index. The Super::Class ends up marching straight into your first index and clobbering it. Very bad.

So now you cleverly move into using some sort of incrementer function, ala this scenario from the original post I was looking at:

package Super::Class; my $idx = 0; sub NEXT_IDX { return $idx++ }; use constant FOO_IDX => NEXT_IDX(); use constant BAR_IDX => FOO_IDX + 1; use constant BAZ_IDX => BAZ_IDX + 2; package Sub::Class; # need to wrap in BEGIN blocks so SUPER:: works in the use at # compile time. BEGIN {our @ISA = qw(Super::Class)}; use constant OTHER_IDX => __PACKAGE__>SUPER::NEXT_IDX(); # oops! NEXT_IDX() is still set to '0' up there, and we just set # OTHER_IDX to 1, clobbering whatever's in the BAR_IDX field!

Okay, so the problem here is that the NEXT_IDX() value is only being called once, and BAR_IDX and BAZ_IDX are ignoring it. That's easy to fix.

package Super::Class; use constant FOO_IDX => NEXT_IDX(); # 0 use constant BAR_IDX => NEXT_IDX(); # 1 use constant BAZ_IDX => NEXT_IDX(); # 2 package Sub::Class; # need to wrap in BEGIN blocks so SUPER:: works in the use at # compile time. BEGIN {our @ISA = qw(Super::Class)}; use constant OTHER_IDX => __PACKAGE__>SUPER::NEXT_IDX(); #now at 3, after BAZ_IDX

The hassle that appears next is what happens when you have multiple subclasses?

package Super::Class; use constant FOO_IDX => NEXT_IDX(); # 0 use constant BAR_IDX => NEXT_IDX(); # 1 use constant BAZ_IDX => NEXT_IDX(); # 2 package Sub::Class; # need to wrap in BEGIN blocks so SUPER:: works in the use at # compile time. BEGIN {our @ISA = qw(Super::Class)}; use constant OTHER_IDX => __PACKAGE__>SUPER::NEXT_IDX(); #now at 3, after BAZ_IDX package Other::Sub::Class; BEGIN {our @ISA = qw(Super::Class)}; use constant DIFFERENT_IDX => __PACKAGE__>SUPER::NEXT_IDX(); #now at 4, after OTHER_IDX

This has two concerns that I'm aware of:

  1. If you've serialized your objects to disk and then read them back in AND in your subsequent program run you define Other::Sub::Class first, you'll change your object indexes as such:
    package Super::Class; use constant FOO_IDX => NEXT_IDX(); # 0 use constant BAR_IDX => NEXT_IDX(); # 1 use constant BAZ_IDX => NEXT_IDX(); # 2 package Other::Sub::Class; # need to wrap in BEGIN blocks so SUPER:: works in the use at # compile time. BEGIN {our @ISA = qw(Super::Class)}; use constant DIFFERENT_IDX => __PACKAGE__>SUPER::NEXT_IDX(); #now at 3, DIFFERENT THAN LAST TIME package Sub::Class; BEGIN {our @ISA = qw(Super::Class)}; use constant OTHER_IDX => __PACKAGE__>SUPER::NEXT_IDX(); #now at 4, DIFFERENT THAN LAST TIME!
    Whoops. Now you can't deserialize your object without making sure that your classes are loaded in the same order they were when your object was serialized.
  2. Perl's arrays aren't sparse, so you'll always be taking up space for those other subclasses that are sitting around. In our little example here, Other::Sub::Class has a blank slot at index 3 that will never be used because the index was assigned to the Sub::Class class. No big deal, it's just one slot.

    But it gets bad when you have bigger objects. What if your super class has 25 slots? Then Sub::Class has 25 slots? Then Other::Sub::Class has 25 slots? Then Additional::Sub::Class has another 25 slots? Additional::Sub::Class there has 50 blank entries sitting around gobbling up memory needlessly.

    Note that this is only a problem for the later classes. The super class still only uses 25 slots (nothing wasted), the Sub::Class still only uses 50 slots (nothing wasted), it's only when you get to Other::Sub::Class that it has 75 slots, 25 of which are wasted.

Eek. So what do we do now? Well, there are some ways we can try and fix it. ideally, we'd like our subclass's indexes to be independent of any other subclass's index. We can try something fancy like maintaining multiple indexes depending upon the package.

package Super::Class; my $class_indexes = {}; sub NEXT_IDX { my $class = shift; #if we have an index for this class, then return it if (defined $class_indexes->{$class}) { return ++$class_indexes->{$class}; } else { no strict 'refs'; my @isa = @{$class . "::ISA"}; #root class has no super my $idx = @isa ? $isa[0]->CURR_IDX() : -1; $class_indexes->{$class} = $idx + 1; return $class_indexes->{$class}; } } sub CURR_IDX { my $class = shift; return $class_indexes->{$class}; } use constant FOO_IDX => __PACKAGE__->NEXT_IDX(); # 0 use constant BAR_IDX => __PACKAGE__->NEXT_IDX(); # 1 use constant BAZ_IDX => __PACKAGE__->NEXT_IDX(); # 2 package Other::Sub::Class; BEGIN {our @ISA = qw(Super::Class)}; use constant DIFFERENT_IDX => __PACKAGE__->SUPER::NEXT_IDX(); #set to 3 package Sub::Class; BEGIN {our @ISA = qw(Super::Class)}; use constant OTHER_IDX => __PACKAGE__->SUPER::NEXT_IDX(); #also set to 3

Damn that's a lot of work! We keep separate counters for each subclass and increment them independently. To add a new index to our class, we need to lookup our index and increment it, or, if we don't have one yet, we look to our superclass and increment there.

All wonderful in theory, but it still doesn't work. This approach introduces additional problems, even.

You now once again have the problem of the Super::Class adding in a new attribute index later in the day and stomping into our index space. Note that we didn't have that issue with the single global NEXT_IDX() incrementer.

And multiple inheritance completely destroys it.

package Distant::Sub::Class; our @ISA = qw(Sub::Class Other::Sub::Class); #both OTHER_IDX and DIFFERENT_IDX point to index 3. Whoops.

You could try looping through all of your parent classes and finding the highest index and basing off of there, but then you end up with the empty slot issue. And your super classes still stomp all over your internals if they add a new attribute later in the day.

A slick alternative is to give each class its own slot and increment there. This will fix the issue of the Super::Class later adding new attributes and help alleviate the problem with empty slots draining memory.

package Super::Class; # now store a global class_idx in addition to the individual # indexes on each class my $class_idx = 0; sub NEXT_CLASS_IDX { return $class_idx++ }; my $class_indexes = {}; sub NEXT_IDX { my $class = shift; return $class_indexes->{$class}++; } use constant CLASS_IDX => __PACKAGE__->NEXT_CLASS_IDX(); use constant FOO_IDX => __PACKAGE__->NEXT_IDX(); # 0 use constant BAR_IDX => __PACKAGE__->NEXT_IDX(); # 1 use constant BAZ_IDX => __PACKAGE__->NEXT_IDX(); # 2 package Other::Sub::Class; BEGIN {our @ISA = qw(Super::Class)}; use constant CLASS_IDX => __PACKAGE__->NEXT_CLASS_IDX(); use constant DIFFERENT_IDX => __PACKAGE__->SUPER::NEXT_IDX(); #set to 0 package Sub::Class; BEGIN {our @ISA = qw(Super::Class)}; use constant CLASS_IDX => __PACKAGE__->NEXT_CLASS_IDX(); use constant OTHER_IDX => __PACKAGE__->SUPER::NEXT_IDX(); #also set to 0

You now access your slots with a double lookup:

my $obj = Sub::Class->new(); $obj->[CLASS_IDX]->[OTHER_IDX];

Of course, that CLASS_IDX constant is only available inside your own namespace, and if you need to export, you'd have to rename so everyone doesn't stomp all over each other. This fixes the multiple inheritance issue, since OTHER_IDX is hanging off of Sub::Class's slot, and DIFFERENT_IDX is hanging off of Other::Sub::Class's slot. So even though they're both 0, they're in different places. You really need to use accessors with this approach, though, unless you want to keep track of all of the class constants for all of your superclasses.

We don't need to worry about looking at our super class's attribute indexes, since we'll never stomp on them. So the NEXT_IDX code is greatly simplified to just be a counter on our particular class.

You still have the problem with empty slots, but you only end up with empty slots for each additional subclass that exists, not every attribute of each additional subclass. And you also can't reliably serialize, due to load order issues.

If you have complete control, you can theoretically add in a pure constant to each subclass. So Sub::Class has CLASS_IDX 1 and that's it. Nothing else can do it. That'll work if the code is only internal, but will break if it escapes into the wild. If Widget Corp releases Super::Class, then Frobnoz Corp can release Sub::Class (which they've hardwired to CLASS_IDX 1), then Foo, Inc. releases Other::Sub::Class (which they've also hardwired to CLASS_IDX 1), then you'll have problems if you try to use those two subclasses at once.

There may be solutions to these issues to allow you to continue using arrays. Or heck, they may not be enough of a concern to you. Me? I never solved these problems. Instead I just took the speed penalty and switched everything to hashrefs and stopped worrying about it.

Replies are listed 'Best First'.
Re: Problems I've had with array based objects
by ikegami (Patriarch) on Jul 25, 2006 at 17:12 UTC
    I don't see the need of calculating all the indexes automatically. I settled on the following:
    package Node; use constant FIRST_IDX => 0; use constant IDX_PARENT => FIRST_IDX + 0; use constant NEXT_IDX => FIRST_IDX + 1;
    package ElementNode; BEGIN { our @ISA = 'Node'; } use constant FIRST_IDX => __PACKAGE__->SUPER::NEXT_IDX(); use constant IDX_NAME => FIRST_IDX + 0; use constant IDX_ATTS => FIRST_IDX + 1; use constant NEXT_IDX => FIRST_IDX + 2;

    If ElementNode needed to access IDX_PARENT, then I'd either add an accessor or I'd add

    use constant IDX_PARENT => __PACKAGE__->SUPER::IDX_PARENT();

    If a field needs to be added, only the indexes in the class where the field needs to be added are changed.

    package Node; use constant FIRST_IDX => 0; use constant IDX_PARENT => FIRST_IDX + 0; use constant IDX_ISROOT => FIRST_IDX + 1; use constant NEXT_IDX => FIRST_IDX + 2;

    This doesn't support multiple inheritance, but it does support traits and (java-like) interfaces. As said by Steve Cook, "Multiple inheritance is good, but there is no good way to do it."

Re: Problems I've had with array based objects
by ikegami (Patriarch) on Jul 25, 2006 at 17:24 UTC
    If you wanted a completely automatic method, the following works:
    package Node; BEGIN { my $idx = 0; require constant; constant->import($_ => $idx++) foreach qw( IDX_PARENT NEXT_IDX ); }
    package ElementNode; BEGIN { our @ISA = 'Node'; my $idx = __PACKAGE->SUPER::NEXT_IDX(); require constant; constant->import($_ => $idx++) foreach qw( IDX_NAME IDX_ATTS NEXT_IDX ); }

    Again, this doesn't support multiple inheritance, but it does support traits and (java-like) interfaces. As said by Steve Cook, "Multiple inheritance is good, but there is no good way to do it."

Re: Problems I've had with array based objects
by jimt (Chaplain) on Jul 25, 2006 at 19:21 UTC

    ikegami's two posts still have the problems with multiple inheritance, the super class adding in new attributes at a later date (particularly during run time. If the Super::Class gets new attributes during runtime, it'll cause problems. Yeah, yeah, using constants as opposed to strings or whatnot keeps you from defining new ones at run time, but you can still add onto the index if you're using a closure or whatnot), and serialization if the classes are changed after storing

    But, again, there's nothing "wrong" with it, if you don't care about the issues. I've got no dog in the fight, so if it works for you, go for it. You just need to remember them as potential gotchas.

    And jdhedden, you may have missed my point. Using arrayrefs as your object isn't re-inventing anything (lord knows it's been around a lot longer than inside out objects), but if you do want to use arrayrefs as your underlying objects, there are some more gotchas to worry about. I just decided to illustrate the ones I'd encountered back before I gave up on 'em.

Re: Problems I've had with array based objects
by jdhedden (Deacon) on Jul 25, 2006 at 18:36 UTC
    Why bother re-inventing the wheel? Just use Object::InsideOut. You get the speed of array-based objects, and all of the messy details you've discussed are already handled plus some you didn't discuss such as thread-safety.

    Remember: There's always one more bug.
Re: Problems I've had with array based objects
by tinita (Parson) on Jul 26, 2006 at 08:29 UTC
Re: Problems I've had with array based objects (MI--)
by tye (Sage) on Jul 25, 2006 at 18:12 UTC

    Not supporting multiple inheritance is a good thing, IMO. Undisciplined use of inheritance is a common design mistake.

    - tye        

      So, you're basically saying multiple inheritance equals undisciplined use of inheritance?

      I can't agree with that :)

      Ordinary morality is for ordinary people. -- Aleister Crowley
Re: Problems I've had with array based objects
by vanishing (Initiate) on Aug 04, 2006 at 00:16 UTC
    Perhaps I'm oversimplifying, but wouldn't this work as a base class for array based objects?
    package foo; use strict; sub new { bless [], ref $_[0] || $_[0] || __PACKAGE__; } sub declare { my $self = shift; ## auto vivification fails to make the correct indexes $self->[0] = {} unless $self->[0]; no strict 'refs'; foreach my $name (@_) { ## create the index reference $self->[0]{$name} = @$self; ## create a nicely named index variable ${ref($self) . '::' . $name . '_idx'} = @$self; ## create the accessor/mutator methods (my $code = 'sub { $_[0][VALUE] }') =~ s/VALUE/scalar @$self/e +; *{ref($self) . '::get_' . $name} = eval $code; ($code = 'sub { $_[0][VALUE] = $_[1] }') =~ s/VALUE/scalar @$s +elf/e; *{ref($self) . '::set_' . $name} = eval $code; ## create the (unitialized) attribute itself push @$self, undef; } }
    It trades space (possibly a lot of space) for time and convenience, but that's often a good trade. I'm sure there's a (perhaps non-trivial) way to move the index references back into package space to save memory, but I'll leave that as an exercise.

      Did you intend to support multiple inheritance? If so, you need to remove

      ## create a nicely named index variable ${ref($self) . '::' . $name . '_idx'} = @$self;

      Consider the case wheren Bar isa Foo, and where Baz isa Moo and Foo. The index of a particular attribute of Foo is object-specific, but the code to remove behaves as if the index is class-specific.

        Absolutely correct. I didn't like making indexes object specific to begin with, and hence must have gotten ahead of myself. I think the right way to do this would put the indexes into the class namespace, but I was lazy.

        UPDATE: In fact, the same argument means that the accessor/mutator creation won't work either because new objects will clobber the old closures. I think the solution is to make everything package scoped, but clearly I was, in fact, over-simplifying.