Link to wealthfront.com

Fork me on GitHub

Friday, October 29, 2010

Stop wasting people's time!

We enjoyed hosting Eric Ries, here at Wealthfront, for a great Lean Startup talk focusing on engineering. The event sold out pretty fast and was soon overbooked—we had lots of smart people coming in and talking about taking their engineering organization to the next step with methodologies like Continuous Deployment and looking into the real value of a new line of code.

Since we had to turn down many requests to attend, Ian (our master producer) made sure we publish the event on Wealthfront's UStream channel and let more people tune in.

Thanks a lot for attending the meetup. We have few great guests coming up in the next few months. The Wealthfront Engineering Meetups will be conducted every two months, to learn more about them you may follow our blog or the Twitter account.

More pictures from the event on Flickr.

Update: Voilá Eric's slides:

Wednesday, October 27, 2010

Migrating a Hibernate Bag to a Hibernate List

We ran into an issue recently where we needed to move from a Hibernate bag to an explicitly ordered list. Our root entity is a TaskSchedule, which has many Task(s). We needed to add a list_index column to the Task table, populate the new column and then update our code to use it. Because Hibernate didn't know about the new column, our existing code could continue to function as if it didn't exist and when we deployed the new service, it would begin using list_index. If we were adding new Tasks between the time we populated the column and the new code was deployed, we'd have to write some transitional code to update, but not use the new field. Luckily we were able to avoid that.

The xml configuration change to do this is relatively simple. We moved from:

<class name="TaskSchedule" table="task_schedules">
<!-- ... -->
<bag name="tasks" cascade="all-delete-orphan" lazy="true">
<key column="task_schedule_id" />
<one-to-many class="Task" />
</bag>
</class>

to:

<class name="TaskSchedule" table="task_schedules">
<!-- ... -->
<list name="tasks" cascade="all-delete-orphan" lazy="true">
<key column="task_schedule_id" />
<list-index column="list_index" />
<one-to-many class="Task" />
</list>
</class>

However we had a few thousand rows that needed their list_index populated. Two of us went off with a backup copy of the database and gave it a shot. Not surprisingly, we came up with two different solutions. Speed was of the essence, so we weren't focusing on producing beautiful code (as you will soon see).

The first solution uses a temporary column for side effects:

begin;

set @last_schedule_id := 0;

alter table tasks add column temporary_column bigint;

update tasks set
list_index = if(
concat(task_schedule_id) != concat(@last_schedule_id),
-- some weird typing required coercion to string for the comparison
@i := 0, @i := @i + 1),
temporary_column = if(
concat(task_schedule_id) != concat(@last_schedule_id),
@last_schedule_id := task_schedule_id, 0)
order by task_schedule_id, creation_time, id;

alter table tasks drop column temporary_column;

commit;

The second defines a procedure (make sure to run the mysql client with -A or it will try to do tab completion):

DELIMITER //
create procedure foo()
begin
declare cnt bigint default 0;
declare cur int default 0;
declare t_id, ts_id bigint;
declare done int default 0;
declare s cursor for select id, task_schedule_id from tasks order by task_schedule_id, creation_time, id;
declare continue handler for not found set done = 1;

open s;

read_loop: LOOP
fetch s into t_id, ts_id;
if done then
leave read_loop;
end if;
if cur = ts_id then
set cnt:=(cnt + 1);
else
set cur := ts_id;
set cnt := 0;
end if;
update tasks set list_index = cnt where id = t_id;
end loop;

close s;
end;
//
DELIMITER ;
call foo;
drop procedure foo;

Both of these work and took about the same amount of time to write. As the author of the second bit of code, I'm partial to that method, but I don't doubt that there are better ways to get this done. Anyone have a better solution?

Monday, October 25, 2010

Experience of serializing financial domain objects in database

I found myself often having to make serialization design choices to persist financial data in database. The kind of financial data I'm referring to are domain objects describing stocks, stock quotes, stock fundamentals, corporate actions and so on. For example, a typical stock object has about 20 ~ 30 fields describing its static information: country, exchange, currency, security type, lot size, etc.; an end-of-day stock quote object has about 10 fields representing its daily open, close, high, low, ask, bid and volume.

At the beginning I used to use columns to store the fields. But I soon felt the pain: it's a lot of work to manage the columns. For example, a stock fundamental object easily involves >100 metrics to measure a company's financial results. It's a lot of work to manage all those metrics/columns. And also, more and more new metrics are needed to empower an analytical model. To do that I have to create even more columns, which is pretty rigid.

Later on I started to use JSON to serialize objects in database. Only one text column is needed to store all the fields in an JSON object. If I need to add more fields I just add them in the code and no database operation is needed. It's quite handy.

Even later on I started to use protocol buffers to serialize objects in a single blob column in database. It has the same benefit of adding new fields without adding new columns. The added benefit is that less code needs to be written to represent domain objects: I just need to specify the names and types of the fields. It will automatically generate a class with almost everything I need: getter, setter, builder, and even a CSV-type toString method which I use to dump data into spreadsheets.

But...still I run into difficulties. I often need to give prompt responses to teammates or clients regarding stock data. For example, which country is this company located? which exchange is this stock traded? what trailing P/E ratio is this stock traded as of some date? I can't quickly answer those questions by looking at the database, because the data are all binary in protocol buffers! Also, I frequently need to do ad hoc analysis on specific type of stocks. For example, all the stocks whose security type is ADR and traded on NYSE, or all the stocks whose trailing revenue growth rate >= 15%. I would have been able to do a simple SQL query to retrieve those stocks if the fields were stored in columns. But now I have to write java code to parse the protocol buffers or JSONs to do the simple task !

Balancing all the pros and cons, I'm currently leaning towards using JSON + denormalizing important fields in columns as my way of serialization, because --
  • Denormalizing important fields in columns enables quick SQL queries
  • Using JSON gives the flexibility of adding new fields without the need of adding new columns
  • Yeah using JSON I will have to write more code compared to using protocol buffer...but given the fact that writing the code is only an one-time sunk cost thing, the perpetual benefit of data readability to human eyes adds a lot of value for quick operational response. Not to mention I really hate losing the intuitive aspect of looking and thinking.

Relaxing visibilities using Javassist

I personally like to restrict the visibility of my classes and members as much as possible. However, package-private classes get quickly annoying when used in an interactive environment such as the Scala interpreter.

When we encountered this problem with one of our internal tools, we decided to generate derived versions of our JAR files with relaxed visibilities. Using Javassist, it was pretty straightforward.

First, we define the name of the JAR file we want to convert and the name of the directory in which in want the modified class files to be written. Re-packaging these files is left as an exercise to the reader. (Okay, we are actually re-packaging the class files with a simple shell script. There is nothing wrong with that, right?)

val jar = args(0)
val out = args(1)

Javassist's ClassPool is then configured using the name of the JAR file.

import javassist._

val pool = ClassPool.getDefault
pool.insertClassPath(jar)

The next step is to get a CtClass for each of the class files in the JAR.

import java.util.jar._
import java.util.zip._

val classes = for (
e <- new JarFile(jar).entries.map(_.asInstanceOf[ZipEntry].getName).toStream
if e.endsWith(".class"))
yield pool.getCtClass(e.replaceAll("/", ".").replaceAll("\\.class",""))

Now, we relax the visibility of all the fields, methods, constructors and classes.

classes.foreach(c => 
c.setModifiers(setPublic(c.getModifiers)))

classes.foreach(_.getDeclaredFields.foreach(f =>
f.setModifiers(setPublic(f.getModifiers))))

classes.foreach(_.getDeclaredMethods.foreach(m =>
m.setModifiers(setPublic(m.getModifiers))))

classes.foreach(_.getDeclaredConstructors.foreach(c =>
c.setModifiers(setPublic(c.getModifiers))))

Finally, we write the modified classes in our output directory.

classes.foreach(_.writeFile(out))

Pretty simple, huh?

Thursday, October 21, 2010

Live Broadcast for Lean Startup for Geeks at Wealthfront with Eric Ries

About a month ago, when we announced the second meetup of our "Lean Startup for Geeks at Wealthfront with Eric Ries" series, we couldn't imagine the extraordinary demand for attendance. Our headquarters in Palo Alto, Calif. couldn't accommodate everybody who wanted in, and we had to turn down over 20 people and stop accepting requests for invitations.

However, we're proud of our software and ideology and we want to share in any way possible. This upcoming Monday, October 25, 2010 at 6:30pm PDT we will be broadcasting live Eric Ries' talk on Lean Startups over our UStream channel. Come Monday, tune to our ustream channel, and ask us questions over Twitter by hashtagging your tweets with #wlth.

Attending Lean Startup for Geeks at Wealthfront with Eric Ries in person is by invitation only.

Wednesday, October 20, 2010

Wealthfront track at the Silicon Valley Code Camp

This year's Silicon Valley Code Camp was the largest and greatest so far. Like last year, Wealthfront (then kaChing) had few packed presentations. This year we had a full Wealthfront track, focusing on software quality starting with the first line of code through testing to production.
As requested, here our our presentations. 


A well-typed program never goes wrong

By Julien
We will spend this session talking about type safety. After defining this desirable property, we will look at various examples where broken but well-typed programs are converted to ill-typed programs. In other words, we will learn to leverage the type system in order to detect issues as early as possible. All examples presented in this session are coming straight out of kaChing's code base. Previous exposure to Java and Scala is recommended.


Applying Compiler Techniques to Iterate At Blazing Speed

By Julien and Pascal

In this session, we will present real life applications of compiler techniques helping kaChing achieve ultra confidence and power its incredible 5 minutes commit-to-production cycle [1]. We'll talk about idempotency analysis [2], dependency detection, on the fly optimisations, automatic memoization [3], type unification [4] and more! This talk is not suitable for the faint-hearted... If you want to dive deep, learn about advanced JVM topics, devoure bytecode and see first hand applications of theoretical computer science, join us.
[1] http://eng.wealthfront.com/2010/05/deployment-infrastructure-for.html
[2] http://en.wikipedia.org/wiki/Idempotence
[3] http://en.wikipedia.org/wiki/Memoization
[4] http://eng.wealthfront.com/2009/10/unifying-type-parameters-in-java.html

Automating Good Coding Practices

By Kevin
The price of clean code is eternal vigilance. Everyone wants to work with clean code, but no one wants to be the enforcer. In this session we'll look how at KaChing integrates style checkers and static analysis tools into the build process to keep errors out without getting in the way of developers. Many of the tools discussed are specific to Java or Scala, but the techniques are generally applicable.

Extreme Testing at kaChing: From Commit to Production in 5 Minutes

By Pascal

At Wealthfront, we are on a 5-minute commit-to-production cycle. We have adopted continuous deployment as a way of life and as the natural next step to continuous integration.
In this talk, I will present how we achieved the core of our extreme iteration cycles: test-driven development or how to automate quality assurance. We will start at a very high level and look at the two fundamental aspects of software: transformations, which are stateless data operations, and interactions, which deal with state (such as a database, or an e-mail server). With this background we will delve into practical matters and survey kaChing's testing infrastructure by motivating each category of tests with different kind of problems often encountered. Finally, we will look at software patterns that lend themselves to testing and achieving separation of concerns allowing unparalleled software composability.
This talk will focus on Java and the JVM even though the discussion will be largely applicable.
Check out http://eng.weathfront.com/search/label/tests for the latest from our company's blog.


5-minute Commit-to-Production: Continuous Deployment

By Adam and Eishay
Continuous deployment (CD) takes "release early, release often" to the limit: as long as the build is green you can push code to production--agility at its best. Companies doing CD safely push code dozens (hundreds!) of times a day, rapidly responding to their customers and reducing their "code inventory". In this talk we will discuss the architecture, tools and culture needed for CD and how your company can get there. For example: creating an effective "immune system" to know what problems are happening; what infrastructure software like Apache ZooKeeper can and can't do, and how to best use it; deployment orchestration techniques to quickly yet safely gain confidence in new code; and more!



See you next year or in one of our tech talks!

jQuery the Right Way

jQuery has changed the way we write Javascript by abstracting out much of the painful cross-browser implementation details that used to plague developers, but to use it correctly still requires a little knowledge about what’s going on under the hood. In this post we’ll take a good look at jQuery’s selectors and how to use them efficiently. I’ll also talk briefly about DOM manipulation and event handlers.
At its core jQuery is exactly what its name implies, a query engine designed for search. And just like you’re careful to construct efficient SQL queries, you need to take the same care with your jQuery selectors. Efficient selector use boils down to three main concepts.
  • Using the right selector
  • Narrowing the search space
  • Caching

Ego and #ID

You’re right, it’s never that clear cut

Good browsers actually provide a getElementsByClass which greatly improves the performance of class selectors. This can lead to surprises later when you start testing in IE, while your code will run quickly in Firefox/Webkit, IE will slow to a crawl because jQuery has to emulate getElementsByClass in Javascript.
There is a clear hierarchy of selectors when it comes to speed, from fastest to slowest this is:
  • ID selectors (“#myId”)
  • element selectors (“form”, “input”, etc.)
  • Class selectors (“.myClass”)
  • Pseudo & Attribute selectors (“:visible, :hidden, [attribute=value]”)
ID and element selectors are fastest because they are backed by native DOM operations, specifically getElementById() and getElementsByTagName(). Class selectors along with Psuedo and Attribute selectors have no browser-based call to leverage which puts them at a distinct disadvantage. We’ll see how to take advantage of the speed of #ID selectors in a bit, but first a note on “pseudo selectors”.

Pseudo selectors provide a lot of power in the right situations but they’re also a lot slower. To understand why let’s take a look at how :hidden works in jQuery 1.4.2.

jQuery.expr.filters.hidden = function( elem ) {
var width = elem.offsetWidth,
height = elem.offsetHeight,
skip = elem.nodeName.toLowerCase() === "tr";
return width === 0 && height === 0 &&
!skip ? true :width > 0 && height > 0 &&
!skip ? false :jQuery.curCSS(elem, "display") === "none";
};
See? The magic is revealed, :hidden is actually a function that must be run against every element in your search space. If you have a page with 1000 elements and you call $(":hidden") you’re actually asking jQuery to call the above function a thousand times. You can get away with this if the number of elements you’re iterating over is small (you did root your selector with an ID, right?) but it’s important to keep in mind that you are asking jQuery to run a function against your elements when using :pseudo style selectors.

Narrowing the search

Good semantics, good selectors

In writing your HTML remember that the purpose of IDs are to identify singular elements in your page. Generally you want IDs if you’re trying to tag a single element, and a class only if you’re tagging a collection of related elements. Good HTML semantics go hand-in-hand with optimal selector use.
To reduce our search time we need to reduce the search space. Because the DOM itself is a tree structure we accomplish this by rooting our search in a sub-tree.

Luckily this isn’t hard to do. ID selectors are fast, so fast, in fact, that they can be used to jump to a node deeper in the tree before beginning your search. For this reason jQuery optimizes for selectors that are rooted with an ID selector.

Using $(".myClass") will search every element in the DOM for your class. In contrast $("#myId .myClass") will only search the elements within the sub-tree rooted at #myId. In large pages this can be the difference between searching tens of elements over hundreds or thousands.

If you can’t root in an ID selector you can at least narrow the search with an element selector. While not quite as quick as an ID lookup, it still lets jQuery use getElementsByTagName() under the hood to reduce the search space before proceeding. In fact, it’s rare that you need to refer to a bare class selector and you should avoid it when possible.

Once you’ve got a collection jQuery provides a whole range of DOM traversal methods that are at your disposal. You can use the traversal methods as well as find() and filter() to narrow your search even more. These can be useful when you’ve cached a selector but would like to use only a specific subset of its elements.

Cache Rules Everything Around Me

You may have noticed that the jQuery syntax we all know and love is deceptively similar to a property look up on a Javascript object.
  • Object property: myObject["myKey"].runMyFunction();
  • jQuery selector: jQuery(".myClass").slideDown();
Don’t be fooled, every call to $(".myClass") will re-run your search of the DOM and come back with a new collection; these are not O(1) look-ups! Fortunately there are two ways to avoid making redundant queries, chaining and caching.

When you chain jQuery functions the collection retrieved by the selector gets passed to each successive function. As a result the query never has to re-run since the collection of elements get passed along the chain until it completes. However chaining is only convenient in situations in which you want to perform multiple actions on a collection in a single place in your code. Much better is to cache your selectors in a variable for reuse in whatever manner you see fit:
var mySuperSlowSelector = $(".myClass:contains('foobar') .myOtherClass:visible");
I can now use mySuperSlowSelector as many times as I want, I’ve persisted the collection returned from my query so jQuery won’t be re-querying the DOM. In practice any jQuery-heavy page should be caching selectors. If multiple functions are using the same selector don’t be afraid to maintain a centralized cache in a scope accessible to all of them. In the example below we can access selectors through our “SelectorCache” to ensure we never query the DOM more than once.


$(function() {
function SelectorCache() {
var selectors = {};
this.get = function(selector) {
if(selectors[selector] === undefined) {
selectors[selector] = $(selector);
}
return selectors[selector];
      }
}

var selectorCache = new SelectorCache();
function foo() {
selectorCache.get("#myId .myClass p").fadeOut();
}

function bar() {
selectorCache.get("#myId .myClass p").slideDown();
}
});


You can’t cache everything

Caching your selectors is like caching your search results. If new nodes are added to your DOM, or attributes within it change, your cached collection of elements won’t “automagically” update. In these situations you’ll need to re-run the query and cache a new set of elements, or you may decide caching isn’t useful if things are too dynamic.

Part 2: The DOM is not a database…

But jQuery sure lets you treat it like it is one. Interactions with the DOM are the slowest operations you can perform in client-side Javascript, which makes it a terrible candidate for maintaining state in your application. It’s better to think of the DOM more like a write-only object and less like something you can query for state information. That said, the convenience of something like tagging a sorted column with an “asc” or “desc” class is a prime example of a time when keeping a little state in the DOM can be an acceptable design choice. In the end it’s all about striking the right balance.

The introduction of HTML5’s data- attributes were a useful addition but they only increased the temptation of using the DOM to store state. data- attributes can be useful but remember that jQuery provides the excellent data() method which can serve the same purpose. data() allows you to attach arbitrary data to a DOM element in Javascript, without having to actually talk to the DOM. data- attributes are a nice option when you need to tag elements with extra information while rendering a template, but if you have a choice go with pure Javascript.

While we’re on the subject of DOM manipulation it’s worth noting that jQuery has made some serious progress in speeding up DOM writes recently, but you still need to take care in your DOM manipulation. In general treat every DOM insertion as the costly action it is, minimize DOM touches by building up HTML strings and doing a single append().

As of version 1.4 jQuery also provides the detach() method which removes a node from the DOM and returns it for manipulation. If you’re doing heavy interaction with a DOM node you should detach it while you perform your manipulation and re-insert it when you are finished.

Part 3: A note on events

Events are often another pain point in jQuery-heavy pages and I want to make two quick points.

First, avoid triggering events yourself in code with functions like .click() when you could just as easily run a function, otherwise you incur the overhead of a DOM event when you trigger an element’s event handler. If you find yourself needing to trigger an element’s event handler, consider moving the contents of the handler into its own stand-alone function. With this pattern the function can be called by the click() event handler as well as your code when you need to trigger the behavior yourself.

Second, if you find yourself attaching the same event handler to a large number of elements it’s a good sign you should be using jQuery’s delegates. Delegates allow you to attach an event handler to a common parent of your elements instead of attaching a large number of discrete handler functions to each individual element. In addition to the increase in speed, delegates have the added advantage of firing for new DOM nodes too. If, for example, the delegate for your <tr> tags is bound to the parent <table>, the new rows added on the fly will still trigger the delegate event handler.

The tip of the iceberg

Efficient Javascript and jQuery use is a big subject and we’ve just hit the tip of the iceberg. For an excellent in-depth breakdown you can do no better than Rebecca Murphey’s jQuery Fundamentals. She covers the essentials and outlines best practices in a concise and clear manner. The book should make it onto the required reading list of any developer’s jQuery self-education.



Update 10/23: Thanks to Perceptes on Reddit for pointing out a couple errors in the SelectorCache example.

Tuesday, October 19, 2010

Big Bang Release - kaChing is now Wealthfront

Today is the 1-year anniversary of our marketplace. This milestone marks the end of our pivot and our continued dedication to democratizing access to outstanding investment managers. In just 12 months, we've attracted over $100M of consumer assets and on boarded close to thirty professional managers.

At 4:00pm ET sharp, while we were ringing the NASDAQ closing bell, we released our new site Wealthfront.com, unveiled all the new features at once and launched many new managers. And all happened with one click on the Deployment Manager (DM). As in Jessica Diamond's "Money Having Sex" piece, we'll split the story in four acts.

Foreplay


For the past two months, our team has been hard at work developping the new site. During this time, we continued on our tradition of 30+ deployments a day. The new website has been live behind the scene for all the while. Our infrastucture allows us to tag users with experiements such as DEBUG, REBRANDING or SIGN_UP_FLOW_A. (Check out the one-line split-test post on how these work.)

These experiments are used both for A/B testing and hiding un-released features. Wealthfront employees, for instance, were added little by little to the REBRANDING experiment and were therefore able to use the new site in production to experience the new feel. No need for a maintenance-black-hole staging system! Not to mention that some of our customers were able to be part of a beta release, call us on bugs and help us get the product on the finish line.

Doin' It


At our traditional stand-up "10:30" meeting this morning, we ran through the last open bugs we had to close, final content edits and the release plan. There were more details about updating our Twitter account or company LinkedIn page and such than actual operational changes.

It's Release Time

At 3:30pm ET, 30 minutes to go, we hooked the projector to a TV and watched the live NASDAQ event. Two last commits went through: change the branding from kaChing to Wealthfront and make www.wealthfront.com public. Twenty-five minutes later, the launch sequence was started on the DM. It is worthwile to note that the backend release was orchestrated by our latest team member, Ian Atha, who has been with the company less than two weeks.

Climax


3, 2, 1... go! Andy and Dan ring the closing bell. The site is live and it's now time for the real firedrill.

Wealthfront Rings NASDAQ's Closing Bell

Post-Coital Cig


Given that the site had been live for quite some time, we had no production surprises. For example, we had been monitoring all of our customer's new dashboard pages for errors and slowness.

After running through a post launch checklist, we cracked open some Champagne, and started a barbeque in the middle of Palo Alto in front of our office.

IMG_1608

This has been the biggest release of the year, for which we have prepared for multiple months now. Thanks to constant effort of our team it was also the calmest day of the year! We cheered, ate, drank and enjoyed this huge corporate milestone.