in reply to Re: Your favorite objects NOT of the hashref phylum
in thread Your favorite objects NOT of the hashref phylum

The ugly details of how all this happens should be of no concern for your everyday programmer. The detail of the instance storage (it actually is a blessed HASH) should be of no concern to your everyday programmer either. In short, it should all Just Work.

Such bold statements are fine if all your applications

  1. Run for a couple of seconds or minutes at most.
  2. Create a few dozen or few hundred largish objects at most.
  3. If it is convenient, easy and economic to throw hardware at them to alleviate memory and performance bottlenecks.

But if your application does not fit into this mode of operation; is long running, uses hundreds of thousands or millions of small objects; by it's very nature, pushes the boundaries of both memory and performance of retail hardware to simply hold it's data before you add the overhead of the OO implementation; requires algorithms that need to constant access the entire range of that data with multiple passes; and does not lend itself to being spread across clustered or networked solutions.

For these types of applications, the mechanisms of OO implementation and the the memory and performance overheads they incur are of considerable concern.

By way of example. Many of the problems of bio-genetics involve taking millions of fragments of DNA, totalling 1 or two GB of data, and attempting to match their ends and so piece together the original sequence. Just loading the raw data starts to push the boundaries of retail hardware to it's limits.

Exhaustively iterating all the subsequences of each and comparing them against each other requires an in-memory solution, as splitting the processing across multiple hardware is complex and hugely costly in terms of the communications overhead. Such exhaustive explorations often run for days or even weeks. Ignoring the costs of objectization, or looking at RDBMS solutions to alleviate memory and/or communications concerns can extend those time periods to months.

The notion of using genetic programming techniques, applying the power of statistics and intelligently random mutation and generational algorithms, as a replacement for the exhaustive, iterative searches is an attractive one. Genetic algorithms can produce impressive results in very short time periods for other NP hard, iterative problems--travelling salesman; knapsack problem etc.

The idea of "throwing the subsequences into a cauldron" and letting them talk to each other to find affinities, in a random fashion, scoring the individual and overall matches achieved and then "stirring the pot" and letting it happen over lends itself to making each subsequence an object. If each subsequence of a few 10s of bytes of data is going to be represented by a blessed hash, with it minimum overhead of approx. 300 bytes, then you're only going to fit around 7 million in the average PCs memory. If you instead store the sequences in a single array, and use a blessed, integer scalar as the object representing them, the per object overhead of the OO representation falls to approx. 56 bytes giving you room for something like 38 million.

Will that saving be enough to allow the algorithm to run in memory? For some yes, for others no, but the beauty of Perl's "manual" OO, is that it gives you the choice to balance the needs of your application against the doctrines of OO purity.

Most OO languages do not give you that choice and so they "Just work", until they don't. And then you're dead in the water facing the purchase of expensive hardware that can handle more memory (and the memory to populate it), or making your program an order of magnitude more complex and several orders of magnitude slower by using a clustered or networked solution.

Perl gives you the possibility to address problems at their source, by modifying the choices you make in your own source code. And you can do it today without having to wait 3 months for the Hardware Acquisitions committee to approve your Capital Expenditure Request, or the Finance dept. to get the budget; or go through the Corporate Software Approvals process to lay your hands on the clustering software you need :)

Blessed array indexes as object handles, and direct access to instance data may not be pc as far as OO doctrine is concerned, but a blessed scalar is a blessed scalar regardless of whether it points to a hash that holds a key that indexes into a table that points to the data; or is just a direct reference to the data. And whilst getters and setters to access instance data may prove useful in isolating applications from implementation details, for library classes that will have a long life and are likely to be refactored. For many, perhaps most applications, the level of refactoring that would benefit from that will never happen, and the benefits of that isolation will never be realised.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^2: Your favorite objects NOT of the hashref phylum

Replies are listed 'Best First'.
Re^3: Your favorite objects NOT of the hashref phylum
by stvn (Monsignor) on Mar 25, 2006 at 14:41 UTC
    BrowserUk

    Well, all your points are well taken. Obviously you should always use the right tool for the job. But I am actually not sure that Moose is the wrong tool for you. Because Moose uses metaclasses to build all instances, and those same metaclasses also build the accessors, there is great opportunity for optimizations here.

    As I mentioned, Moose uses Class::MOP, which builds blessed HASH based instances, but this is just the default. It is possible to extend Class::MOP to build other kinds of instances as well. In the examples for Class::MOP, I show how it can be used to build inside-out classes. I also have an example of a Lazy class which will not initialize it's fields until the absolute last possible moment. I have been considering an ARRAY based example as well, but haven't gotten around to it. There is also nothing to stop you from an Inline::C based version which possibly could be made even more space/time efficient than an array version.

    As for how all this will work with Moose, allow me to explan. The primary role of Moose is to collect meta-data about your class definition, and create a number of metaobjects from it. (The extra overhead of these metaobjects will usually be fairly small, since they are a per-class cost, and not a per-object/instance cost, and most systems will have a reasonably small amount of classes compared to the amount of object/instances they generate. But I digress here ...). These Moose metaobjects are used to build things like accessors, and eventually to help build the actual instances too. Since Moose only makes you pay for the features you use, if, for instance, you don't choose to use a type constraint, you don't pay for it's overhead. Now, since all the details of your class is stored as meta-data by Moose, and the Moose metaobjects are managing the details of your object instance structure and means of accessing it (through accessors), it is possible to swap a different Class::MOP based engine into Moose and have it create ARRAY based instances without having to change the surface syntax of your Moose based classes (assuming you don't break encapsulation that is).

    Now, the example I describe is not easily accomplished at this point, because I have not added the proper hooks into Moose for his kind of thing. But Class::MOP has been designed to do this kind of thing from the very start, so its just a matter of getting the tuits to do it.

    Moose and Class::MOP are tools designed to work with Perl 5's OO system and not against it. This means that they should not get in you way if you need/want to do something different, because after all, TIMTOWTDI :)

    -stvn
Re^3: Your favorite objects NOT of the hashref phylum
by blogical (Pilgrim) on Mar 25, 2006 at 05:08 UTC
    Thanks a lot BrowserUK, that's exactly the sort of answer I was looking for! Very well put.

    obey the law
Re^3: Your favorite objects NOT of the hashref phylum
by aufflick (Deacon) on Mar 27, 2006 at 10:18 UTC
    <flameRetardent>
    First let me say that I think that all the ways of making objects discussed here are good choices in many situations.
    </flameRetardent>

    I just wanted to make a comment that many systems do just deal with a relatively small number of objects at once - most web based systems for example.

    I once used an in-house object system similar to what Moose sounds like, but with even more features and overhead (it also included automatic RDBMS mapping etc). It worked fine in the web based system, but then we had to make some batch jobs using the same objects for migration. We had a few hours to build, modify and then tear down hundreds of thousands of objects.

    The system was too slow - it was going to take about 2 days! But wait - theres more! I spent a few days profiling the code and came up with a few smallish changes. I tweaked the most used methods to use more efficient mechanisms. Some methods were being run literally millions of times - in those cases I threw niceness to the wind and removed lexically scoped variables, reached into objects bypassing accessors etc. The were mostly small methods and I made up for the ugliness with large wads of code comments and POD to ensure that the code remained maintainable.

    2 days of execution then became 2 hours. I also did some major tweaking on the RDBMS side, but at least half of the performance gain was due to the perl code changes.

    My point is that you should normally not throw out a code model that benefits your developers because of concerns with future scalability. Unless the model is stupid there is usually a way to make it fast after the fact. This is not always true in other languages where you are contstrained in your options, but in Perl there is always a way to optimise more. If you really need to, you can do wacky things like manipulate the runtime tables or rewrite your most often used methods in XS, but I've never had to do that (which is a pity because it could be fun).

      First let me say that I think that all the ways of making objects discussed here are good choices in many situations.

      No need for the flame retardant, I completely agree with you. If you look again at the post, I was responding to the "bold statements" I quoted only.


      If I have an application that will benefit from OO, I use OO as I suggested in my example above.

      If I have an application that needs a blessed hash, I'll use a blessed hash. Or a blessed, array or a blessed scalar, or a blessed glob.

      If I write a module that I think might be usefully sub-classed, and especially if I think that it might be be useful to others via cpan, I'd probably opt for the former, simply because it's what most people are used to and would be least likely to cause surprises.

      But I do not feel obliged to make all modules OO, just be cause OO is cool, and I certainly don't feel obliged to wrap an OO facade around those parts of my code that are fundamentally not OO, just to satisfy the dogma of OO purism.

      For example, the 'singleton pattern' is a farce. It is a dogmatic wrapper to conceal a global variable. It is used because in the dogmatic world of OO purity, globals are not OO, therefore globals are bad.

      IMO, to use a quaint old phrase my grandmother would resort to on the rare occasions that something really made her angry--that is just so much stuff and nonsense.

      OO is a tool--not a philosophy, way of life, or mandatory way of programming. And like any other tool, you should use it when it benefits your application, and not when it doesn't.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      Now whose being dogmatic ;)

      The singleton pattern is often used in that way by those who have not been taught better (or who are simply lacking in mental horsepower).

      Singleton objects (or similar) can be useful however. Because they hide the fact that there is only one, it can be changed later to have more than one. Say for example your program logs to a logfile, so you use a singleton object factory to return a thin class with a file handle and a print_to_log method. Later you want to use a different logfile depending on the name of the method (or whatever) - the object factory can be changed to return you a different logger object based on your criteria. You're still caching file handles, just a number of them instead of only one. If you used a global variable you would have to change every point where you print to that filehandle to achieve the same effect.

        Although you've responded to your own post rather than me, I'll assume the guilt for 'now being dogmatic' :) I'll also accept the judgement that I'm "simply lacking in mental horsepower" to see how this works?

        Following on from your logger example. Each place in your application you have something like

        package Some::Class; use Logger; sub new { ... $self->{logger} = Logger->new; ... } sub print_to_log { my $self = shift; $self->{logger}->( @_ ) } sub someMethod { my( $self ) = @_; ... $self->print_to_log( 'some stuff' ); ... } sub someOtherMethod { my( $self ) = @_; ... $self->print_to_log( 'some other stuff' ); ... }

        Could you explain to me how you would arrange for the logging from someMethod to go to one file, and the logging from someOtherMethod to go to a different file without changing Some::Class?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.