Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Versioned database schema git conflicts.

by chrestomanci (Priest)
on Apr 02, 2014 at 13:43 UTC ( #1080774=perlquestion: print w/replies, xml ) Need Help??

chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

Greetings brothers, I seek your knowledge and wise council on a matter where many are at work and must avoid trampling each others feet.

The project I am working on uses DBIx::Class::Schema::Versioned to version the MySQL schema of the database used by the application. The schema has an integer version number that monotomicaly increases whenever a developer makes a change to the schema.

When the developer makes a change he writes an upgrade script containing SQL statements to upgrade the schema. The script is named something like DB-50-51-MySQL.sql, where 50 is the current schema version, and 51 is the next. There is also a downgrade script in case it is necessary to back out the change.

The problem.

The problem is that as the team has grown, and the tempo of development increased there are a lot of changes to the database schema, and we are getting a lot merge conflicts with these upgrade scripts.

The current schema merged into trunk might be at version 50, but there are probably at least three different features at various stages of development and code review that make changes to it and have different upgrade scripts (with the same name) to move it to version 51. The contents of the upgrades are not in conflict because they make unrelated changes to different tables, but as soon as one of those changesets gets merged, the others can't be because they are trying to change the same file.

While the process of resolving the merge conflict is not that difficult, but it is tedious, prone to error, and getting annoying because of how often it is necessary. Sometimes git merge or git rebase claims to have resolved a merge conflict automatically when it has not and this causes even worse breakage if it is undetected.

I have looked into some alternatives to DBIx::Class::Schema::Versioned such as DBIx::Class::DeploymentHandler and DBIx::Class::Migration, and while they have many useful features, they all appear to be stuck to the idea of integer version numbers, so I would experience the same problem with merge conflicts if I switched to one of them.

Possible solutions.

Some on my team have suggested using feature names for the scripts, So that the upgrade script would be named as Add_feature.MySQL.sql, (With a similar name for the corresponding downgrade script). As the names would not be re-used there would be no conflict between the upgrade files. When applying an upgrade the database would maintain a list of features that have been applied, and run any upgrade scripts for any that are missing.

Another proposed solution is to date the upgrade scripts instead of numbering them, where the date is when it is first created (not when it is eventually merged). This would provide a form of soft ordering of the upgrade scripts.

A third solution is to keep all the upgrade & downgrade SQL in a single large perl file, where each set of upgrade commands is a separate entry in an ordered array. When conflicting versions of the file are merged there would still be a conflict, but it would be less tricky to resolve because there would be no need to rename files

The problem with all of these proposed solutions, is that there is no out of the box support for them in DBIx::Class::Schema::Versioned or any of the alternatives that I can find. It would be necessary to spend time on developing a custom solution to manage schema upgrades.

Can the monastery suggest a way forward. Have you solved a similar problem yourself, if so how? Is there another module out there for maintaining database schema versions that solves this more elegantly?

Replies are listed 'Best First'.
Re: Versioned database schema git conflicts.
by Rhandom (Curate) on Apr 02, 2014 at 14:22 UTC
    Your layout seems somewhat similar to the layout we use at our company. Historically we have used files similar to updates/${table}_${rev_from}_to_${rev_to}.sql to make our updates. Our update process has also allowed for files of the form updates/${table}_${md5_from}_to_${md5_to}.sql .

    However, after getting tired of renaming those tables and having left over schema lying around, we have now opted for a second more compact way of doing things. For our schema we now can have up to 3 files for a table definition:


    The $table.sql file contains the correct creation schema including extents and engine information. If using DBIx::Class::Schema::Versioned I'm sure this would likely be closer to a YAML file definition.

    The $table.inserts file contains initial data for tables that require fixed data.

    The $table.updates file replaces the updates/$table_... files. Inside of $table.updates there are sections that look like this:

    ------- BEGIN $sha1_from -> $sha1_to ----- # (vc rev and/or project #) ALTER TABLE `$table` MODIFY ...; ------- END -----

    Parsing of this file is fast. It is easy to take the existing table definition active in the db, and then find all of the different ways it can be morphed into new schema based simply on the sha1s of the source and destination.

    New definitions normally go at the top of this file. Older defintions can slowly be culled from the bottom when all known schema installations have upgraded past a certain point (or just leave them there - it will be fast until you get past a couple hundred revisions).

    my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re: Versioned database schema git conflicts.
by Bloodnok (Vicar) on Apr 02, 2014 at 14:06 UTC
    In my experience, it's very rarely a good idea to use dates &/or version identifiers in the names of version controlled files.

    For myself, I'd use common file names for install & remove script pairs e.g. install_update.sql & remove_update.sql and then use the traditional method of continuous integration check-in whereby you rebase your repo to the current master (thus merging changes in the context of your repo) before pushing the result back up to the master. If absolutely necessary, the DB version can be recorded by means of the check-in comments - which using repo hooks, can/could be automated.

    A user level that continues to overstate my experience :-))
Re: Versioned database schema git conflicts.
by stonecolddevin (Parson) on Apr 02, 2014 at 18:40 UTC

    Have you looked at sqitch?

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

      I had not looked at sqitch, but I will now.

      Thank you very much.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1080774]
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2022-08-08 09:45 GMT
Find Nodes?
    Voting Booth?

    No recent polls found