Configuration… Hardcode it!

November 21, 2013

If you’ve read other Wealthfront engineering blog posts (Continuous Deployment: API Compatibility Verification, Move cash faster, Pushing a Feature On My First Day), you’re aware that continous integration and deployment forms the core of our development process. Many consequences of a continuous integration and deployment process such as ours are obvious – high confidence test suites, robust build/test/deploy automation, etc. A less obvious consequence is that the streamlined development process impacts how we code. Consider software configuration.

Software configuration management becomes complicated quickly as a system evolves, and so any simplifications to it are valuable. For the purposes of clarification, characterize software configuration as:
  • Environment Config – configuration that enables software to run properly within a specific environment. Examples include host configuration, database connection information, and external service endpoints information. Various tools exist to help manage environment configuration. Chef and ZooKeeper are a couple that we use at Wealthfront.
  • Behavior Config – configuration that defines how software functions, and in some cases what data elements are available to the system. Examples include formula constants, logic thresholds, and the definition of constant data such as Wealthfront investment options.
Behavior config is interesting. The concept is trivial – define constant data which will be consumed by code. What is simpler than assigning values to variables?
However, software developers (myself included) are trained to define behavior config outside of code, typically within config files or database tables.
Why? So we can tune the system safely, easily, and with quick turnaround. The build/test/package/deploy process is typically expensive and risky (relatively speaking – more risky than restarting services, for example). And so we are conditioned to minimize it. Enter config files. Enter database tables.
Circumventing the build/test/package/deploy process, or parts of it, which is in place to identify problems before they reach production is obviously not awesome. Plus, environment config and behavior config tend to make their way into the same config files, and almost by definition, any config file containing environment config is duplicated many times over. Enter config file complexity. Similar complexity results from behavior config stored in database tables, because they too are duplicated.
With all of this duplication of behavior config, are you sure that the code being tested is the same as that which will run in production?
Enter continuous integration and deployment.
 
At Wealthfront we deploy small updates to production all day long with the push of a button on our Deployment Manager dashboard. Operationally, changing code and deploying to production is actually easier than updating production config files.
EtfModel Refactor
Wealthfront invests client funds across a fixed set of investment options (see Wealthfront FAQs). I recently reworked how these investment options are specified. ETF investment options were stored in the etf_models table, and represented in code by the corresponding EtfModel entity class.
While there is no use case requiring them to be updated on the fly, storing ETF investment options in the database seemed reasonable. There are dozens of them, and new ones are added from time to time. As it stood, new investment options could be added by inserting into the etf_models table. However, investment configuration was not fully represented within the database, so code changes were still required for the system to utilize new investment options. In this case, the in-database / in-code configuration hybrid made it really easy to consider moving all of the investment configuration into code. I did, and realized that the resulting code was far simpler. In the end, I removed multiple files related to database persistence, and simplified the code which configures which EtfModel’s to invest in, in what ratios, and under which scenarios.

Most important in this case, in code without concern for external persistence, it became easy to evolve EtfModel into a more flexible data model representation which lays the groundwork for some exciting investment features coming in the future. EtfModel became EtfVehicle, an enum-like class within a small class hierarchy which makes way for new types of investment vehicles. A representation as rich and flexible as this would be complicated if expressed within database tables or config files. However, in code it is easy to implement and easy to test. And, with continuous integration and deployment, the data it configures is trivial to maintain.

The AllocatedAssetType base class provides an abstraction for all types of assets, including cash, across which funds are allocated.
The InvestmentVehicle base class extends AllocatedAssetType and provides a unified representation for all future types of investments.
EtfVehicle extends InvestmentVehicle and contains the data previously configured within the etf_models table, using the type-safe enumeration pattern prevalent in java before java 5.0.

The etf_models table was dropped from the database, and even though EtfModel.id numeric values were retained (within InvestmentVehicle) for backward compatibility, table columns referencing etf_models.id now instead reference EtfVehicle by name as you might do when persisting an enum value. This makes database data easier to inspect, now that there is no etf_models table to join with.
My EtfModel refactor worked out so nicely that it got me thinking about how much simpler an implementation is when constant data is defined in code vs. in config files and/or database tables. I am now on the lookout for other constant data currently stored in the database or config files, and my habit of blindly keeping behavior config out of code has been broken!