No Downtime Database Schema Changes

Maintaining database compatibility while deploying north of 30 times a day often raises a slew of questions during our tech talks. In this post, we will cover backward and forward compatible database schema changes. We will survey specific classes of change and describe how they enable no downtime rollouts.

For formalists out there, we are assuming the software accesses your database in a sane way. You always mention all columns explicitly. No select * from and no insert statements assuming a specific column ordering.

Picture a simple users table with three columns: the identifier, the full_name and lastly a nullable email_address.

+----+-----------+---------------+
| id | full_name | email_address |
+----------------+---------------+
|  1 | Alice     | NULL          |
+----------------+---------------+
|  2 | Bob       | bob@gmail.com |
+----+-----------+---------------+

Alice has no e-mail, while Bob can be reached on Gmail. We would now like to evolve our system to support e-mail validation. When a user provides his e-mail address it will be marked non-validated. Trivial, we need a column is_validated.

The complexity revolves around live updates of a cluster talking to the database. A this stage, our application reads and writes to (id, full_name, email_address). As a result, adding a new column is harmless as long as it is nullable or has a default value. Adding a non-nullable column with no default, would break inserts since our software would not provide a value and the database would reject the row with a nullabilit constraint failure.

alter table users add column is_validated bool default true

We will discuss the choice of the default value soon. Let’s focus on the resulting schema first.

+----+-----------+---------------+--------------+
| id | full_name | email_address | is_validated |
+----------------+---------------+--------------+
|  1 | Alice     | NULL          | true         |
+----------------+---------------+--------------+
|  2 | Bob       | bob@gmail.com | true         |
+----+-----------+---------------+--------------+

At this stage, we can start pushing the new version of the software which takes full advantage of the new column and introduces validation. Due to clustering, the old version and the new version will work at full capacity during a significant amount of time. Both versions will write to the database during that time and both versions will read each others’ data since it is being shared via the database which is a common resource. Since the semantics of the old version was to give the benefit of the doubt to users when they entered their e-mail address and trust them that they are valid, we need to automatically encode this for the new version which is why the default must be true for the is_validated column.

Removing a column follows the exact same logic, in reverse. Ship code that does not need the extraneous column and achieves the same semantics of the code that does. Drop the column. Done.

Next category is widening such as moving from integers to real numbers or from an enumeration with 3 choices to one with 5 choices. Since this is a net addition, the process is very much like adding a column. In contrast, narrowing behaves the same way as removing a column.

To hone your skills and for the heroes out there, bonus points for solving the following puzzle.

Your clustered application has a table with two columns, an identifier id and a value column which stores percentages as a real numbers between 0 and 1 (e.g. 97% is 0.97). Without any schema modifications, evolve the application to store percentages as numbers between 0 and 100 (e.g. 97% is 97). Your solution must include full data migration and at no time can your application be unavailable.

Engineering Blog – Wealthfront

Interested in what you see here?