I've been working on a big (in Rails terms) team for a few months now and we've come to some conclusions concerning Migrations. I'll start with the bad and in part II I'll talk about the decisions we made and why.
Migrations are great, but they do come at a cost. When working with a large team (my current team size is 14 and growing) migration conflicts happen. This can be mitigated with communication, but migrations can definitely be a bottleneck. Also, the process of creating a migration can be painful on a large team. Before creating a migration you should always update from source control to ensure you get all the migrations checked in by your teammates. Then, the best case scenario is when you can create a migration that doesn't change anything and immediately check it in. Checking in a new migration immediately helps ensure you don't block other teammates from creating migrations; however, it's not always as simple as adding a new table. Migrations that alter the database structure often break several tests. Obviously, you can't check those migrations in until you fix all the breaking tests, which can take time. During this time, database changes are blocked for the entire team.
It can also be troublesome to find specific changes to the database within multiple migration files. Finding which migration adds a specific column can take a fair amount of time when you are working with over 50 migration files. Naming conventions can mitigate this issue some; however, naming conventions generally require that only one action occur per file. For example, the 023_create_customers_table.rb file can only create the customer table and cannot alter the purchases table to add the customer_id column. This type of naming convention helps on searching for specific changes to the database; however, it also results in a large number of migration files.
This is a problem when you have a team making a lot of changes to the Data model, right?
ReplyDeleteAnd not as much a concern when in production?
It occurs to me that maybe the issue lies with the relational database not being a good tool to use when in development because it *needs* migrations.
Migrations, as I've seen them, weren't supposed to be a way of versioning your database. They were supposed to be a way to migrate a database to a new version of the software.
Someone on Coding Horror mentioned keeping their entire database in version control. When I read that, I laughed because it seemed silly, but if keeping the database properly versioned with migrations is really a problem, then maybe it's a pretty good idea.
The real problem is trying to write software with 14 people. Need I say more?
ReplyDeleteThe next problem is when the system is already in production and your changes impact data, in addition to schema. We had to move away from Rails Migrations (and database independence) to deal with the way we have to move data around in our legacy SQL Server database. However, we are using a similar scheme with the actual SQL scripts. The concept of a database schema version table alone has saved us a day here and there when trying to figure out what was wrong with an environment.
ReplyDeleteWhat are the alternatives though? I read "Refactoring Databases" or "Agile Database Techniques" last month looking for answers: there aren't a lot of other good options if you aren't going to do BDUF.
One suggestion is to consolidate the various scripts at major intervals, just to simplify creation of new databases. It's more of a big deal for us, because we have lots of database specific stuffs.
Also, run the annotate models plugin to get the current view of the data in your model class files. I can't stand poking through all of the migrations to figure out what is going on.
We use migrations based on timestamp (avoid collisions), and a table which keeps track of whether a migration has run.
ReplyDeleteSomeone had a patch which did something like this but I haven't looked at it (http://dev.rubyonrails.org/ticket/6838)
Before we did this, we had a file called "next_migration" and you would increment it when you had a new migration. If you go to check in and you have a conflict (someone else updated next_migration). Then you know you have to update and re-order the migration number. This solution worked well.
But after we started doing real SCM (source control management) we had to go with a timestamp solution since we have alot of branches. Merging conflicts of migration numbers across branches is a nightmare.