persisting Moose objects and schema evolution

morgon has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: persisting Moose objects and schema evolution
by stonecolddevin (Parson) on Apr 08, 2009 at 00:22 UTC

Well, if you're not completely opposed to the idea of switching up your database some, KiokuDB was designed for this very thing.

It allows you to create your Moose object and store it in the database (which, currently seems to be most stable with the DBI backend), and retrieve it just the same. The DBI backend means it would be really trivial to set up on top of your existing database.

meh.

[reply]

Re: persisting Moose objects and schema evolution
by ELISHEVA (Prior) on Apr 08, 2009 at 09:04 UTC

So why not have one set of attributes that simply get mapped to database-fields of their own and at the same time have another set of attributes that get serialized as XML or YAML and that get then stored in a common CLOB field.

Whenever you store any information twice in a non-readonly database you are asking for trouble, limiting reuse, or at least obligating yourself to extra coding to insure that the two copies stay synchronized. Generally that extra coding is a lot harder than it first looks.

Suppose somebody wants to use the structured data in the database and modifies it? Then the next time your CLOB gets loaded it will be out of sync with the database. Now you could check to make sure they are in sync when you load, but then you are back where you started - a class with a load routine that is sensitive to the structure of the database.

And then there are the synchronization and timing issues discussed by perrin in the post he linked to. Suppose you are running your "make sure CLOB and Db are in sync" routine. Seesion A grabs the CLOB and the Db fields to compare them. Meanwhile while the check is going on session B modifies the Db fields (or the CLOB, but not both). Session A thinks that the two are in sync because they were when it grabbed the data. But in fact they aren't. Is this problem solvable? Of course, but now we are into transactions, locking, save triggers (so that saves to the DB fields automatically update the CLOB) and so on and so forth. Furthermore, the amount of code that needs to be conscious of both the DB and class architecture multiplies. These aren't easy programs to write and they are even harder and time consuming to properly test.

One could say: only my application is modifying data. But then you loose a lot of the goodness of a relational database back-end. First and most important - the integrity checks are now happening in the application, not the database. Secondly, SQL saavy users can non longer ad-hoc queries using a variety of applications and front ends - many of which are not written in Moose or Perl. And what about applications that process the data for other business needs. Are they supposed to all be in Perl using this front end?

Maybe this is a one-shot website or maybe reuse of data isn't important to your clients right now, but all the same, one has limited their long term ability to leverage their investment in their data and in one's services. Basically one has locked them into a one-entry point solution with no long term growing room (other than enhancing your application). In all honesty, if $I were the buyer, mad would be an understatement.

That being said, I fully agree with your need to separate the bulk of your application code from the database schema. However, the solution isn't in stored duplication. Rather, it is in separating the in-memory data structures used for dumping and loading data from the physical implementation of the object. When the database structure changes you want the object structure to "just work". When the object structure needs to change you want the database persistence routines to "just work". To do this, you can:

Design phase: list out all the data that is used in common by the database and by the application.
Based on your analysis phase, design a simple intermediate data structure that stores all of this common information. This is *pure* data - no methods, no objects, just arrays, hashes, etc. It doesn't do validations or anything fancy - it is just a conduit: a meeting place where diverse dump/load routines can get what they need.
Write your DB routines read data from and load data into an intermediate data structure.
Similarly write class constructors and factory methods that dump and load to that data structure.

The key principle here is separation of concerns. In this architecture, the only time you will ever need to change both the DB load/dump code and the class load/dump code simultaneously is if the actual information that needs to be exchanged between the database and application changes. Mere changes to the structure or even contents of the DB side will only affect the DB dump/load routines. Mere changes to the class internals will only cause a change on the side of the changing structure. Thus you can safely add transient data to your class or even restructure your entire DB for optimization purposes and one will not affect the other.

And even when actual data exchanged does change, there still is no direct intermixing of DB logic with class logic. Each only cares about their relationship to the shared information captured in the simplified intermediate data structure. The updated DB dump/load and class dump/load routines can be written, if need be, by entirely different programmers each with different expertise: the new DB routines can be written by the DBA and the new class routines can be written by the webapp programmer.

If you think this strategy might be helpful, you may want to check out the module/framework I discussed here. This was exactly the kind of scenario that I was faced with, only the problem was even worse. The client wanted files in YAML format today, but might want a database back-end tomorrow. The objects being constructed were very, very complicated with lots of yucky graph navigation and I was not exactly excited about rewriting even a line of them should the storage medium change dramatically.

Best, beth

[reply]

Re^2: persisting Moose objects and schema evolution

by morgon (Priest) on Apr 08, 2009 at 14:56 UTC

Just one thing to make my idea clear:

I do not intend to story any information twice. Every attribute would EITHER be mapped to a database-column OR would be part of an XML / or YAML data-structure in a CLOB.

So there would be no duplication beteen the data stored in fields of their own the the data stored in the CLOB.

And if I was to use XML for the CLOB (or the approriate XML-datatype) I could even access this data using some XPath-expression from any other environment/language (at least you can in Oracle).

Just for clarification...

[reply]

Re^3: persisting Moose objects and schema evolution

by ELISHEVA (Prior) on Apr 08, 2009 at 15:52 UTC

Thanks for the clarification...

Usually when people selectively map some attributes to DB fields and some to CLOBs it is because the CLOBs contain data that is either (a) only of interest to a specific application (e.g. display attributes used by web app X) or (b) because the data represents some complex chunk of data that just doesn't normalize in any meaningful way (photographs, certain kinds of geographic data and network maps come to mind).

Of course, the Oracle XMLType changes this reasoning a bit since, at least according to the documentation, you should be able to navigate the embedded XML data using SQL. However, I've never used this feature and there are a lot of niggling little questions I would want to know more about before I relied on it:

How well integrated is the XML data into Oracle's transaction management, stored procedures, triggers, etc? Is SQL the only DB-like feature or does Oracle view this as a set of DB tables, just stored in a funny place?
What happens if the XML data needs to reference data stored in normal tables via foreign key? Will the normal foreign key integrity checks take place? Conversely, if normal fielded DB data is updated, are updates cascaded into the XML data? Can both select and update queries use views that include joins that span regular tables and tables embedded in XML data?
To what extent is the client locked into Oracle already (due to other business applications)? Do they want the option to change to another DB back end at some point in the future? If so, what other databases support such close DB/XML integration? Are those databases that the client would consider scaling up to? It would be unfortunate for a client to get locked into a DB solely because a mission critical application decided to be creative with features that only Oracle provides.

Best of luck with your choices, and have a happy Easter/Pesach/Springtime weekend, beth

[reply]

Re: persisting Moose objects and schema evolution
by perrin (Chancellor) on Apr 08, 2009 at 01:29 UTC

here

[reply]

Re: persisting Moose objects and schema evolution
by Jenda (Abbot) on Apr 10, 2009 at 13:38 UTC

No matter how well you choose the two groups of fields, sooner or later you will need to use one of the serialized ones within a database query and/or need to remove or modify one of the mapped attributes.

IMHO, it's more work than it's worth. Map them all. Changing the schema is not that messy if you keep order in the update scripts and log the applied changes.

Jenda
Support Denmark!
Defend the free world!

[reply]

Re: persisting Moose objects and schema evolution
by phaylon (Curate) on Apr 11, 2009 at 03:13 UTC

DBIx::Class supports schema versioning and upgrading. Also, uniting Moose and DBIC directly is being worked on. I'm not directly involved, but the folks on #dbix-class or maybe #moose on irc.perl.org might have more information.

Ordinary morality is for ordinary people. -- Aleister Crowley

[reply]


We don't bite newbies here... much
	PerlMonks