So why not have one set of attributes that simply get mapped to database-fields of their own and at the same time have another set of attributes that get serialized as XML or YAML and that get then stored in a common CLOB field.
Whenever you store any information twice in a non-readonly database you are asking for trouble, limiting reuse, or at least obligating yourself to extra coding to insure that the two copies stay synchronized. Generally that extra coding is a lot harder than it first looks.
Suppose somebody wants to use the structured data in the database and modifies it? Then the next time your CLOB gets loaded it will be out of sync with the database. Now you could check to make sure they are in sync when you load, but then you are back where you started - a class with a load routine that is sensitive to the structure of the database.
And then there are the synchronization and timing issues discussed by perrin in the post he linked to. Suppose you are running your "make sure CLOB and Db are in sync" routine. Seesion A grabs the CLOB and the Db fields to compare them. Meanwhile while the check is going on session B modifies the Db fields (or the CLOB, but not both). Session A thinks that the two are in sync because they were when it grabbed the data. But in fact they aren't. Is this problem solvable? Of course, but now we are into transactions, locking, save triggers (so that saves to the DB fields automatically update the CLOB) and so on and so forth. Furthermore, the amount of code that needs to be conscious of both the DB and class architecture multiplies. These aren't easy programs to write and they are even harder and time consuming to properly test.
One could say: only my application is modifying data. But then you loose a lot of the goodness of a relational database back-end. First and most important - the integrity checks are now happening in the application, not the database. Secondly, SQL saavy users can non longer ad-hoc queries using a variety of applications and front ends - many of which are not written in Moose or Perl. And what about applications that process the data for other business needs. Are they supposed to all be in Perl using this front end?
Maybe this is a one-shot website or maybe reuse of data isn't important to your clients right now, but all the same, one has limited their long term ability to leverage their investment in their data and in one's services. Basically one has locked them into a one-entry point solution with no long term growing room (other than enhancing your application). In all honesty, if $I were the buyer, mad would be an understatement.
That being said, I fully agree with your need to separate the bulk of your application code from the database schema. However, the solution isn't in stored duplication. Rather, it is in separating the in-memory data structures used for dumping and loading data from the physical implementation of the object. When the database structure changes you want the object structure to "just work". When the object structure needs to change you want the database persistence routines to "just work". To do this, you can:
- Design phase: list out all the data that is used in common by the database and by the application.
- Based on your analysis phase, design a simple intermediate data structure that stores all of this common information. This is *pure* data - no methods, no objects, just arrays, hashes, etc. It doesn't do validations or anything fancy - it is just a conduit: a meeting place where diverse dump/load routines can get what they need.
- Write your DB routines read data from and load data into an intermediate data structure.
- Similarly write class constructors and factory methods that dump and load to that data structure.
The key principle here is separation of concerns. In this architecture, the only time you will ever need to change both the DB load/dump code and the class load/dump code simultaneously is if the actual information that needs to be exchanged between the database and application changes. Mere changes to the structure or even contents of the DB side will only affect the DB dump/load routines. Mere changes to the class internals will only cause a change on the side of the changing structure. Thus you can safely add transient data to your class or even restructure your entire DB for optimization purposes and one will not affect the other.
And even when actual data exchanged does change, there still is no direct intermixing of DB logic with class logic. Each only cares about their relationship to the shared information captured in the simplified intermediate data structure. The updated DB dump/load and class dump/load routines can be written, if need be, by entirely different programmers each with different expertise: the new DB routines can be written by the DBA and the new class routines can be written by the webapp programmer.
If you think this strategy might be helpful, you may want to check out the module/framework I discussed here. This was exactly the kind of scenario that I was faced with, only the problem was even worse. The client wanted files in YAML format today, but might want a database back-end tomorrow. The objects being constructed were very, very complicated with lots of yucky graph navigation and I was not exactly excited about rewriting even a line of them should the storage medium change dramatically.
Best, beth