in reply to Re^3: A First CPAN Odyssey
in thread A First CPAN Odyssey

One of the key issues with which I wrestled was how to instantiate a large collection of objects, associated with their children in one-to-many has-a relationships, while neither executing superfluous queries nor pulling in extra information in the join operations.

I think with these kind of tools, you cannot avoid having some superfluous queries as well as occasionally pulling in extra information. If your tool is so fine grained that it loads too little, it will likely slow to a crawl when trying to retrieve a lot of information. However the same can be true if your tool is too coarse grained, and it grabs much more than it needs each time. Personally I would recommend two possible approaches (which themselves are not mutually exclusive):

Lazy Loading
Load your main objects completely, but all relationships lazily. This will work pretty well for your more basic of DB schemas which have few relationships and contain much of the information for an individual entity in a single row/table. This way when fetching a collection of objects, you can get only as much as you need. But when you really are looking for a single object and all its related information, you can take the time and fetch it all.

Multiple Views for each entity
The idea here is that you are not always going to need the same "view" of an entity and its relationships for all situations. For instance, sometimes you only need the user_name and password information for your User object (both fields contained in the same hypothetical table), while other times you need their first_name, last_name, address, zip-code and phone number as well (contained in one or many hypothetical tables linked to the user table). It makes sense maybe to have two kinds of User objects, one for verifying user_name and password upon login, and the other for printing out mailing labels, each optimized for their specific usages.

Oh, and both of your links seem like excellent resources to promote the uniqueness of my own module,...
Don't get too caught up in the "uniqueness" of your module. There are many here who will cry "Your Re-Inventing the Wheel!", but the fact of the matter is that just because its been on CPAN for 5 years, and has made it into the core module set, blah blah blah, doesn't mean its still the best tool for the job (and in particular the job you are doing) and might not be old and krusty. There is always room for someone to "Build a better Mousetrap" out there, and unless you try, you will never know. However, always keep in mind that in order to build Mouse::Trap::Better.pm you should know about all the other mousetraps available out there so that you can learn from thier mistakes and build upon their success.

-stvn

Replies are listed 'Best First'.
Re^5: A First CPAN Odyssey
by jZed (Prior) on Jun 23, 2004 at 01:08 UTC
    Don't get too caught up in the "uniqueness" of your module. There are many here who will cry "Your Re-Inventing the Wheel!"
    I agree that uniqueness should not be a deciding factor in whether or not to develop and CPAN the module and hope the OP didn't take my remarks that way. But in naming and describing the module it's best to focus on what its strengths are, what it shares with other similar modules, and how it differs from other similar modules.
      Don't worry, I haven't taken it the wrong way. Actually, initially I did, but then I re-read your post and understood what you meant. I appreciate it as valuable advice.
Re^5: A First CPAN Odyssey
by skyknight (Hermit) on Jun 23, 2004 at 02:09 UTC

    Actually, the way I've done it you can have your cake and eat it too, in that you can choose precisely how lazily objects are loaded with respect to their relationships, and the only "waste" is the foreign key with which I perform my own "join" external to the database, which isn't really waste because it is necessary to match up objects and their children/parents. Perhaps "overhead" is a better word.

    To query a collection of objects, you instantiate a SQL::Object::Query object. Its constructor takes as an argument an array ref that holds zero or more objects of type SQL::Link::Query. When you issue its execute method, it returns to you a SQL::Object::ResultSet object which is more or less an iterator.

    Basically, what happens internal to the execute method is as follows... It invokes the execute method of each SQL::Link::Query object, but not before specifying that each of the results should be ordered by the primary key of the objects being specified by the SQL::Object::Query object. You basically end up with n+1 streams of results, where n is the number of SQL::Link::Query objects being used. For each item that you pull off the object stream, you look in each link stream and see if there is a run of one or more links related to that object, and if so you pull them off the stream and associate them with the object.

    As an additional thing to note, you can specify that the queries pull in a maximum number of records at a time. When any given stream runs out of results, the stream is refilled by issuing the query again with the LIMIT parameter modified appropriately. In the case of relationships, this logic lives internal to SQL::Link::ResultSet. For objects, the logic lives within SQL::Object::ResultSet. As such, the whole process is transparent to the user. He thinks that he is getting a steady stream of objects, when under the hood my code is performing queries piece wise to avoid clobbering system memory, and sewing things together without his knowledge.

    The laziness with which you load objects depends on how many SQL::Link::Query objects you pass into the SQL::Object::Query constructor, and you can ostensibly pass in none at all, resulting in complete relationship laziness. The SQL::Object objects are capable of later doing on-demand loading of relatives also by using a SQL::Link::Query object, this time specifying as a condition that the child id (or parent id, if you like) be equal to its own primary key.

    Did all that make sense? If there is a logical hole in my implementation then I'd love to hear about it sooner rather than later. As best I can tell it meshes the need for performance and flexibility pretty well. I just hope that there isn't some kind of limiting flaw that I have missed that cripples its capabilities.

      Did all that make sense?

      For the most part, Yes. But without really seeing all the documentation and being able to try things out, I can't really say how much sense it makes, as I may be mis-interpreting you.

      If there is a logical hole in my implementation then I'd love to hear about it sooner rather than later.

      Again, nothing I can see from what you describe, but again, the sooner we can all read the real docs, the better we could tell.

      I just hope that there isn't some kind of limiting flaw that I have missed that cripples its capabilities.

      Thats what version 0.02 is for :). A few things I have learned after uploading several modules to CPAN (and there are more to come, I am converting our entire internal framework to open source CPAN modules). 1) Very few people will jump to use version 0.01 of a brand new modules from an unknown author (I know I don't), 2) You will never know who is using/downloading your module, as there is no way to tell really (well there are ways, but they will get your banned by merlyn) and 3) spell check your documentation :)

      -stvn