Worlds Collide

Before I ever became interested in finance or big data, I was working on software for real-time, concurrent embedded systems. I was the main contributor for the Soft Walls project, a 9/11-inspired project where we were designing aircraft control algorithms to prevent pilots from entering no-fly zones. I was later a contributor to the Ptolemy project, a continuing effort that focuses more generally on how to model, design, and simulate such systems.

Yesterday, I headed up to the Tenth Biennial Ptolemy Miniconference to give a talk on the data platform we’ve been building here at Wealthfront. The impetus for the talk was a realization that the design of distributed data systems is similar in many ways to the design of distributed embedded systems, like the TerraSwarm project, which focuses on the design of networks of integrated smart devices, from the phone to the toaster. This is an active area of focus for the Ptolemy project.

In both fields, data flow abstractions are used to model the movement of information between components in the whole system. This is evident in the data world with projects like Cascading and Spark. In both fields, the distributed nature of the systems makes testing a challenge. On this note, there was an interesting poster by Ben Zhang on simulation-in-the-loop testing for swarm applications. The basic idea was to test parts of the swarm through simulation and other parts on the swarm, a kind of hybrid unit/integration test. Both fields have timing constraints, SLAs in the data space, and multi-thread, multi-machine concurrency is a major concern for both.

With that said, here are my slides:


It’s worth calling out that there were many interesting presentations and posters. One by Chris Shaver and Martin Lohstroh showed how to use Hindley-Milner type inference (HM(X)) to guard against a class of programming errors when building swarm applications. Ilge Akkaya had a cool demo feeding a keyboard into a Ptolemy II model that would do jazz improvisation based on a riff a user would start on the keyboard. I’m not sure there’s much relation between that and big data, but it was a super cool demo!