Link to wealthfront.com

Fork me on GitHub

Friday, May 17, 2013

The first week at Wealthfront


I just finished my first week at Wealthfront. In my first three hours I fixed a small production bug. By the end of the week, I had shipped my first feature: the backend code to serve up the commit statistics feeding the visualization at our new engineering page. Moving this quick was empowering, and it confirmed that I made the right decision to join Wealthfront.

There were three enablers that made this possible. First, the code base is clean. This was not the easiest hello-world assignment, because it involved scanning the git logs to find out when commits were happening. I then wrote computed statistics to Voldemort, so that the online page could quickly serve up the visualization without blocking on scanning the git logs. I was able to find some git-utility helper functions in the code base that almost did what I wanted. I made a few tweaks, and I had the data I wanted. Writing and reading from Voldemort was trivial, and there were some great examples in the code base.

Second, I never have to deploy anything locally. Once I created my backend API, I was able to find plenty of JUnit/jMock examples that let me test and gain full confidence in my code without ever deploying it. There was even a code-coverage test that failed because I forgot to test one of the API calls I added.

Finally, the deployment system made pushing my code to production a snap. I pushed my code with a description of what service needed to get redeployed in the commit message. As soon as I checked my code in, the Wealthfront deployment manager automatically runs all tests. My commit message instructed it to deploy all instances of the production service with the latest code after validating all tests pass. The whole process, from pushing code to having it run in production, took just a few minutes. There was no staging/QA cycle, or waiting for a deployment time. It was just push and deploy.
$ git commit -a -m "Backend code for commit stats. #realease:dm"
$ git push
I couldn’t imagine a better process for new hires. I’m was able to fix a bug in the first day and implement a feature in the first week. I’ve already moved on to implementing a bigger end-to-end feature.

Monday, April 29, 2013

I can haz lambda on Java 7?

Superficially, lambda expressions in Java 8 are just syntactic sugar for instances anonymous inner classes. Syntactic sugar should not require classfile changes, so one might expect that code compiled with Java 8 lambdas could run on a Java 7 JRE. It turns out that you can't do this, and not just because of an automatic "we always increment the classfile version". To see why, it helps to start at the beginning.

We're big fans of functional programming, but not quite willing to scrap our years of perfectly good working Java code, so our compromise is a functional Java style, which means lots of anonymous inner classes. As you can imagine, we're looking forward to Java 8 with Project Lambda and were a bit disappointed to see the latest timeline doesn't have a GA release until a year from now.

What could we do until then? There are early access versions available, but "let's upgrade production to this early access build" and "automatically managing hundreds of millions of dollars" don't seem to be a good match. What if we could compile with Java 8, but run on a well-tested Java 7 environment?

First, let's create two similar classes, one using an anonymous inner class, and the other using lambdas.

Look at all those braces I didn't have to type. Beautiful. Unsurprisingly, if we compile with Java 8, we can run with Java 8 with no problem.

kevin$ $JAVA8/bin/javac *.java
kevin$ $JAVA8/bin/java WithLambda
and it works...
kevin$ $JAVA8/bin/java WithoutLambda
and it works...

If we try to run those classes under Java 7, we have a problem.

kevin$ $JAVA7/bin/java WithoutLambda
Exception in thread "main" java.lang.UnsupportedClassVersionError: WithoutLambda : Unsupported major.minor version 52.0
kevin$ $JAVA7/bin/java WithLambda
Exception in thread "main" java.lang.UnsupportedClassVersionError: WithLambda : Unsupported major.minor version 52.0

Well, can we target a different classfile version? There's no combination of source and target that will actually make javac happy with this.

kevin$ $JAVA8/bin/javac -target 1.7 *.java
javac: target release 1.7 conflicts with default source release 1.8
kevin$ $JAVA8/bin/javac -source 1.8 -target 1.7 *.java
javac: source release 1.8 requires target release 1.8
kevin$ $JAVA8/bin/javac -source 1.7 -target 1.7 *.java
warning: [options] bootstrap class path not set in conjunction with -source 1.7
WithLambda.java:3: error: lambda expressions are not supported in -source 1.7
        Runnable r = () -> System.out.println("and it works...");
                        ^
  (use -source 8 or higher to enable lambda expressions)
1 error
1 warning

So there's simply no way to compile Java source with lambdas into a pre-Java 8 classfile version. It turns out this isn't just a failure to expose the possibilities as options. Although lambda could have been implemented as pure syntactic sugar with compile time replacement by an anonymous inner class, it isn't. You can see something is up when you look at what these two files compile to.

kevin$ $JAVA8/bin/javac *.java
kevin$ ls -l *.class
-rw-r--r--  1 kevin  staff  1041 Apr 26 13:41 WithLambda.class
-rw-r--r--  1 kevin  staff   555 Apr 26 13:41 WithoutLambda$1.class
-rw-r--r--  1 kevin  staff   393 Apr 26 13:41 WithoutLambda.class

The inner class version compiles as expected into two classes, but the lambda version compiles into a single class, which means that instance of Runnable goes... where? Turns out, it doesn't exist, at least not at the bytecode level.

Brian Goetz explains the details in Translation of Lambda Expressions. First, the body of the lambda is converted into an internal private method, which you can see in the class file.

kevin$ javap -private WithLambda
Compiled from "WithLambda.java"
class WithLambda {
  WithLambda();
  public static void main(java.lang.String[]);
  private static void lambda$0();
}

Next, instead of creating an instance of the inner class (which was never created), it calls the lambda metafactory, a new platform method which dynamically creates an instance of the right type with the body of the abstract method consisting of a call to the lambda method. This particular call to the lambda metafactory is a special form call the lambda factory and it uses the invokedynamic instruction to make it possible to specialize the metafactory. Specifically, there are many cases where the JVM will have no need to actually generate a class implementing the interface, but instead return some simpler internal structure.

So as anyone familiar with Betteridge's law knew from the start, you cannot have lambdas in Java 7. With Java 8 still a year away, and Scala tool support improving, migrating is looking attractive.

Thursday, April 18, 2013

Reactive Charts with D3 and Reactive.js

New to Reactive.js? Check out last week's introduction.

All visualizations are ultimately a composition of smaller elements; data sets, scales, individual lines, and labels to name a few. Those elements relate to each other in different ways, and to manage their interdependency we typically find ourselves writing a master "render" method — something that can take new data and re-draw the visualization in its entirety, imperatively recalculating each component.

At Wealthfront we've been using Reactive.js to do things a little differently. We describe our visualizations as a flow of information, not as an imperative set of rendering steps. To illustrate this, we'll build up a simple bar chart that shows how Reactive.js can change the way you write, and interact with your visualizations — hopefully for the better! When we're done, our humble chart will look a little something like this:


First, we need to declare each component of our chart and how those components relate to each other, so let's decompose our chart into the values that describe it.

Width and Height


Width and Height represent the width and height of our chart in pixels. Since they're just simple values, we'll represent them with $R.state(). Remember that $R.state() just returns a reactive function that lets you store and retrieve a value.  We'll set the width and height to 200px by default.


Data

"[{name:foo, value:10}, {name:bar, value:20}]"


Our data will be a simple array of object literals. Each object will have a name and value. Our chart will need to consume this data and render the bars accordingly. Since this is another simple literal value, we'll use $R.state() again, and default to an empty array.


Y Scale


In D3 we use scales to map the values in our data set to actual pixels on the screen. We do this by specifying a domain (the max and min of our data) and a range (the max and min dimensions of our chart in pixels). In this case our Y scale will need to have access to the chart's height, and our data, so that it can create a scale that maps our data to proper pixel heights on the screen.

We'll define our Y scale as a function, and then "reactify it" so that it can represent the Y scale value in our visualization. Since our Y scale needs access to our chart's height and data, we'll need to bind it to those values so that they can update our scale when they change.


X Scale


Our X scale behaves similarly, we'll use an ordinal scale to map our discrete bars onto actual X positions and widths on the screen. Our scale will need our data, and our chart's width to determine how things should map. We'll also use the rangeBands feature of D3's ordinal scales to figure out how wide each bar should be.


Bars


Now we need a value that actually represents the SVG Rectangles that will make up our bars. If you've used D3 before, the code below should look pretty familiar — it follows the standard D3 process to add, update, and remove bars from our chart. Our bars need the SVG group that contains them, our data, the chart's height, and the x and y scales to map our data values to actual coordinates. We define the function, reactify it, and bind it to the relevant values.


Labels


Last but not least, we need to represent our labels. These will be SVG Text elements, contained in their own labels group. Like our bars, they'll need their container, as well as our data, x, and y scales.


All together now

This JSFiddle shows our example functioning in its entirety. The inputs allow you send values to chart.width(), chart.height() and chart.data(). Those three reactive values that we exposed on our BarChart object become our API. When we assign new values to them, data flows through our graph of values, changing the chart's representation in the process.



Because we expressed how data flows through the various components that make up our visualization, there's no need for a master "render" method, we allow the chart to update itself. Note that only the parts of our chart affected by a given change will be updated. In our example that's everything, but in large visualizations the targeted updates Reactive.js provides let us avoid unnecessary re-rendering of untouched elements. The best part is, we get that behavior for free.

Hopefully this provides a small, tangible example of how thinking reactively can inform the design of your code. Stay tuned in the weeks to come as we address topics like reactive UIs, AJAX requests, and other fun applications!

Monday, April 15, 2013

A "Reactive" 3D Game Engine in Excel?

A great article over at Gamasutra outlines a toy 3D engine that uses a spreadsheet to do the calculations. Instead of the imperative approach, this "engine" is more functional-reactive. As mentioned in our post on Reactive.js the other day, Excel serves as a good metaphor when starting to grok functional reactive programming.

Tuesday, April 9, 2013

Reactive.js: Functional Reactive Programming in Javascript


Reactive.js is a pure Javascript library inspired by Functional Reactive Programming. If you've ever used Excel or another spreadsheet program, you've already done something like FRP.

Reactive.js aims to bring FRP to Javascript by augmenting Javascript functions, allowing you to declare data flows in your code by representing your values as reactive functions that depend on one and other. Complicated UIs, data visualizations, and systems of calculations are examples of just a few problems that can be simplified by using reactive programming.

Reactive.js is already being used in production on Wealthfront.com, where it's helping us gracefully manage the interconnection of portfolio model calculations, visualizations and UI elements. Check out a sample plan to see what I mean (it's pretty neat).

If you've never heard of FRP before I'll step you through some basics and show you how Javascript (and Reactive.js) enter the picture. You can also grab Reactive.js on github right now, the README provides a walkthrough and introduces the API if you'd prefer to jump right in.

Reactive Programming in 60 seconds


In Reactive programming it's easier to think of our variables as expressions, not as assignments. In order to understand the difference, consider the statement "a = b + c".

There are two ways to look at this. One is the way we are used to, "a is assigned to the sum of b and c, at the instant this is interpreted", which is what we'd expect in Javascript or any other imperative language.

But we could also read it as "a represents the sum of b and c, at any point in time." This interpretation is not really so strange, that's exactly how we would expect things to work in a spreadsheet. If we had a spreadsheet cell containing the expression "=B+C", the value in that cell would change as the cells B and C change, wouldn't it?

Reactive programming means we describe our program using the second interpretation. We don't assign variables, we express them, and they don't represent discrete values, they represent a value that changes over time.

Reactive Programming in Javascript

Obviously Javascript is not a reactive language, but there are advantages to being able to express ourselves reactively. Reactive programming can be immensely useful in describing systems of equations, complicated UIs, and data visualizations to name a few.

So let's reexamine our earlier statement about reactive programming, then we'll see how Javascript fits into the picture (and how Reactive.js fits into Javascript).
"We don't assign variables, we express them..."
In Javascript, a = b + c assigns a value. For us to accomplish our goal, however, we need to describe what a represents using an expression (like =B+C in a spreadsheet). Javascript does have expressions, they're called functions! So in reactive programming a given value, like a in a = b + c is expressed as a function:
    var a = function (b,c) { return b + c } // a = b + c
This brings us to our first conclusion.

Conclusion 1: Our variables are expressions, so our variables are functions.

Now lets consider the rest of the sentence:
"…and they don't represent discrete values, they represent a value that changes over time"
When you write =B+C in a spreadsheet, your spreadsheet program notes that your cell is relying on the values of B and C. It starts to assemble a dependency graph internally that it can use to keep track of changes. It traverses that graph when B or C change, updating A in the process. Most importantly, we don't have to write a "calculate all" function because the spreadsheet program handles that for us.
Unfortunately Javascript won't magically track dependencies, so it's not enough to describe our variables as expressions, we also need to tell our expressions what they depend on. Only then can they be smart enough to update each other automatically.

Conclusion 2: We have to tell our expressions what they depend on.

When we combine our two conclusions, we arrive at the following:
Our variables are expressions, so our variables are functions. And we have to tell our expressions what they depend on, so that means we have to tell our functions what they depend on.
Fortunately for you, Reactive.js does just that.

Using Reactive.js

At its core, Reactive.js is just a single method, $R(). $R() accepts a function and returns you an augmented version of that function that is meant to represent a value in your program. How is it augmented exactly? "Reactive functions" gain a new method called .bindTo(). bindTo() accepts one or more reactive functions, and binds them to your function's arguments via partial application.
Don't worry about $R.state yet, it just returns a reactive function that gets and sets internal state (handy for literal values) — the docs explain it in more detail.
    //A = B + C
    var reactiveA = $R(function (b, c) { return b + c });
    var reactiveB = $R.state(2);
    var reactiveC = $R.state(1);
    reactiveA.bindTo(reactiveB, reactiveC);

    reactiveA();   //-> 3
    reactiveB(5);  //Set reactiveB to 5
    reactiveC(10); //Set reactiveC to 10
    reactiveA();   //-> 15
That's it. As you can see, Reactive.js asks you to express values as functions and gives you the tools you need to tell those functions how they depend on each other. In the example above, any time reactiveA or reactiveB change, reactiveC will change too. reactiveC isn't assigned a+b, it represents a+b at any moment in time.

An example with time

Since we talk about variables representing a value that changes over time, let's actually create variables that depend on, well, a value that changes over time.


Reactive.js is minimal

Reactive.js seeks to be as minimal and unobtrusive as possible. Because it operates on, and returns, normal Javascript functions, it's very easy to integrate into existing code. If you start writing "reactive" code, any existing function can be integrated as a dependency by creating a reactive version of it with $R().

The future

Reactive.js is new, but improving every day. Our newest adventure is defining our d3 visualizations declaratively, using Reactive.js, instead of the imperative pattern we're used to. We're able to describe how data might flow into a visualization, informing things like the x and y scales, and updating the relevant SVG. In the process we do away with our classic "render-all" function, instead trusting Reactive.js to update only the components of the visualization that need changes when the data is modified.

Stay tuned in the weeks to come as we show other ways Reactive.js can integrate into your code


Tuesday, April 2, 2013

Not too late to sign up for Odersky's Scala Coursera class

Just a quick note if you haven't seen it. Martin Odersky is teaching a Coursera class Functional Programming Principles in Scala. On the backend we've been experimenting with how we can use Scala more easily in our 99% Java project, and apparently we've been talking about it enough that Matt Baker decided this was a good way to expand his backend knowledge.

Now I'm signed up and loving it. I'm only a week in, but it looks like a good refresher of the subjects in Structure and Interpretation of Computer Programs and for me a refresher in how Scala should be written rather than my "Java without semi colons" style.

Tuesday, March 19, 2013

Continuous Deployment: API Compatibility Verification

We haven’t said much about our continuous deployment system recently. Mostly that’s because there hasn’t been much to say. We invest in systems and infrastructure through a process called proportional investment: we spend time on areas that cause us problems and our deployment infrastructure has performed well, requiring little incremental investment.
However, recently we made a significant improvement that is worth discussing: the addition of API compatibility checking to our instance deployment strategy. Our existing instance deployment strategy was a fairly intricate process involving unannouncing the service from our ZooKeeper based service discovery system, deploying the new package, running a self test to verify that the instance will be happy on production (checking database connectivity, Java Cryptography Extension installation and a number of other things that have caused us issues in the past), announcing the service and then monitoring system and business metrics to gauge the production impact of the new code. While that monitoring frequently caught service compatibility issues, the fundamental tradeoff of speed of deployment against comprehensiveness of the safety provided by post-deployment monitoring had occasionally caused us trouble.
The most significant class of the problems were when an engineer moved too fast and deployed new code out of order, leaving a newly deployed service trying to invoke RPC calls that didn’t exist on an older service. This is a common problem in engineering teams managing service oriented architectures with complex distributed systems. Our development and infrastructure methodologies make us somewhat resistant to these types of errors because our ability to affect change at scale quickly means that there is relatively little global system state that an engineer has to manage in their head at any given time. Significant system changes that can take other organizations a few months to execute can be performed safely in our system within a hour or two. That makes errors of this type less frequent, but doesn’t eradicate them.
When presented with service compatibility issue due to deployment ordering recently, we had a choice to make. We could add a human managed process to the release cycle or we could attempt to manage it programmatically. Our aversion to human processes is that they are more error prone and slow than well written equivalent automation, so we embarked on automating a solution.
Our production backend services are based on a JSON-over-HTTP RPC framework. A URL endpoint exposes an arbitrary number of invokable calls with typed arguments. Some of these arguments are optional, with default values. We had a bit of internal debate over the scope of the solution. Should we type check RPC invocations or just check existence and method arity? Ultimately, we decided that every issue we had identified that could have been solved by a system like this over the last two years would have been solved by the simpler solution. While type checking the RPC arguments would have been cool, it was also needlessly complex for the problem we were facing.
We began the implementation by standardizing on a way for services to expose the RPC calls they use. These can either be declared statically during service build time, discovered through reflection at either build time or live on production or by recording invocations on production by our RPC client. There are a number of tradeoffs in each of these solutions and it’s simplest to say that we use a mixture of these techniques depending on the service and the certainty that we desire.
Next we need a way to retrieve the RPC calls that a service supports. Luckily we had a debugging RPC call named “Help” already in production. We added support for an additional machine readable output format. Optional arguments are handled by exposing that the RPC call supports different arities.
Finally we added compatibility checking to our deployment system. When a new instance is released, but before it announces itself to take production traffic, the deployment system verifies that all RPC calls the new instance makes are present on production and that all RPC calls existing production services make to the service type of the new instance are still available. If there are problems with either check, the deployment is stopped, a rollback is initiated and developers are notified with specifics.
Since the release of this system a few months ago, we recently had the first significant automatic rollback caused by an service compatibility issue. There are a few points worth making in summary. First, this relatively small investment in infrastructure a few months ago was still good last week and will continue to be useful for a long time to come. Second, the relative infrequency of issues of this type (especially when compared to our current rate of 20-50 deployments per day), makes it incredibly unlikely that human process would have shown enough continued vigilance to catch this. Third, if you’re interested in working with us to democratize access to sophisticated investment advice while working on state of the art infrastructure, email us at jobs@wealthfront.com.