Link to wealthfront.com



See what we're up to on Wealthfront.com or follow the Wealthfront Upfront Blog Fork me on GitHub

Friday, February 17, 2012

Interesting Reads

Have a great president's day weekend!


Friday, February 10, 2012

Taco Friday

Catered lunch? Too easy.

Ten foot six inch taco truck under a ten foot five inch roof? Challenge accepted.

Boots-n-Cats: Every Techno Song Ever In Clojure

I've wanted to do a post about programming music for quite a while, but honestly there aren't very many opportunities for an online financial advisor to do very much research into the breadth of music libraries (even for a sophisticated, yet simple one, like us). Regardless, I chose to disregard these realities when I discovered a neat little library called Overtone. Here's the technical summary for the project:

Overtone is a musical programming library written in Clojure which uses the SuperCollider audio engine and synthesis server under the covers. We're essentially marrying an awesome live-synthesis server with an insanely cool state-of-the-art lisp to create a glorious union that only the Gods can dream about.

Seriously, that's the kind of bravado that's going to propel us to rock stardom in no time (leather pants optional). But what shall we create? Well, it also happens that a friend of mine sent me this video, and I instantly thought, I can do that!



Now, let's see how.

It's really easy actually if you're on a mac, but first, you'll probably want to install Leiningen, the Clojure project tool. Follow the instructions on their page to get it set up. Then you can type:
$ lein new boots-and-cats
$ cd boots-and-cats
Then edit the project.clj file to contain:
(defproject boots-and-cats "1.0"
  :dependencies [[org.clojure/clojure "1.3.0"]
                 [overtone "0.6.0"]])
At this point, I'll let you know that you should also just clone my repo at GitHub. It already contains the samples, but you can also just go there and download them directly.
git clone git://github.com/hitch17/boots-n-cats.git
Then you can use lein to grab all the dependencies and be you on your way. If you're on linux, you'll also want to follow these instructions for getting your audio set up.
$ lein deps # Note: this downloads the entire internet.
Now let's get this thing started. You'll also want to turn your speakers to be sufficiently loud. I can guarantee that all the co-workers/family/roommates in your vicinity don't want to miss out on hearing the birth of a legend.
$ lein repl
REPL started; server listening on localhost port 48182
user=> (use 'overtone.live)
[... prints a bunch of stuff talking about your audio system ...]
          _____                 __
         / __  /_  _____  _____/ /_____  ____  ___
        / / / / | / / _ \/ ___/ __/ __ \/ __ \/ _ \
       / /_/ /| |/ /  __/ /  / /_/ /_/ / / / /  __/
       \____/ |___/\___/_/   \__/\____/_/ /_/\___/


                          Programmable Music. v0.6
[...]
You should get some sweet Ascii Art and then the fun begins. Load some samples and play them:
user=> (def boot (sample "boot.wav"))
#'user/boot
user=> (boot)
28
user=> (def cat (sample "cat.wav"))
#'user/cat
user=> (cat)
28
Let's add a little reverb to the samples in one ear and get a little bit of a panning effect.
user=> (def boot-sample (load-sample "boot.wav"))
#'user/boot-sample
user=> (defsynth reverb-boot []
  (let [dry (play-buf 1 boot-sample)
        wet (free-verb dry 0.6)]
      (out 0 [wet dry])))
#<synth: reverb-boot>
user=> (reverb-boot)
28
user=> (def cat-sample (load-sample "cat.wav"))
#'user/cat-sample
user=> (defsynth reverb-cat []
  (let [dry (play-buf 1 cat-sample)
        wet (free-verb dry 0.6)]
    (out 0 [dry wet])))
#<synth: reverb-cat>
user=> (reverb-cat)
Pretty cool and shockingly easy. Let's load our other samples.
user=> (def bees (sample "bees.wav"))
#'user/bees
user=> (def kneehigh (sample "kneehigh.wav"))
#'user/kneehigh
Let's also set up a beat with our boots and cats samples using the metronome function which helps the library keep everything in time and create a player for our beat.
user=> (def metro (metronome 60))
#'user/metro
user=> (defn player [beat]
  (at (metro beat) (reverb-boot))
  (at (metro (+ 0.5 beat)) (reverb-cat))
  (apply-at (metro (inc beat)) #'player (inc beat) []))
#'user/player
user=> (player (metro))
And when you're done, call (stop):
user=> (stop)
You can also record your masterpiece and when you're finished call (recording-stop).
user=> (recording-start "awesome.wav")
:recording-started
[... play something awesome ...]
user=> (recording-stop)
With a little practice and decent timing (copy-and-paste skills help), you should be able to create a pretty decent techno song like this one: Bootz-and-Catz. Well, okay, maybe making something really awesome takes a little more work, but hopefully by now you have a good idea of where to get started.

Now go forth and unleash your inner Techno-Viking!

Friday, February 3, 2012

Interesting Reads

It's Friday and you didn't want to work anyway. Here's some things to check out while you wait for the weekend to arrive.

Wednesday, February 1, 2012

Visualize Facebook's IPO prospectus

It's here. Finally. Social-networking darling Facebook yesterday filed an epic initial public offering seeking to raise $5B with the company valued between $75B to $100B. The IPO prospectus (also known as "Form S-1") filed with the SEC is a massive registration document describing the details of the offering, the company's business model and preliminary financial results. But who has the time to read the lengthy document in details? I know you don't. You probably don't want to either, especially having read tons of news articles about it already. So I did some simple text analysis in R to visualize the prospectus such that you can take a quick peek.

The following graph shows the word cloud of Facebook's IPO prospectus, i.e. the most frequent words in the document. Not surprisingly, majority of the words are related to IPO: stock, class, shares, common, prospectus, stockholders, financial, initial, public, offering, capital, equity, and so on. The other words are related to Facebook's business: mobile, advertisers, users, platform, social, data, awards and so on.


I'm actually more interested in what the prospectus tells about Facebook's business, by somehow removing most IPO-related words. I choose Facebook's archrival -- Google. Even Facebook lists Google as its "significant competitor" in the prospectus. I compare Facebook's prospectus with Google's filed back in 2004, and plot the comparison word cloud between the two in the following graph. The top part of the graph shows the words that are frequent in Facebook's prospectus but infrequent in Google's; the bottom part shows the words that are frequent in Google's but infrequent in Facebook's. This exercise (hopefully) would remove most of the IPO-reated words because they are frequent in both documents, and focus more on the two companies' businesses. Facebook-specific words are: facebook, social, users, friends, apps, mobile, platform, developers, engagement, etc. Google-specific words are: google, search, adsense, adwords, services, auction, technology, etc. Interestingly, company executives' names show up in both documents: "mark" and "zuckerberg" for Facebook; "eric" and "sergey" for Google. Maybe even more interestingly, the word "revenues" appears very prominent in Google's cloud.


What about the other social-networking site LinkedIn? This graph shows the comparison between Facebook's prospectus and LinkedIn's filed last year. LinkedIn-specific words are: solutions, professional, linkedin, hiring, talent, profile, job, marketing, subscriptions, premium, etc. LinkedIn's CEO "weiner" also shows up in the graph. The company name "linkedin" is much less prominent compared to "facebook" and "google".


If you are interested doing similar analysis, here is how.

The prospectuses are downloaded from SEC's website. Data links --
- Facebook's S-1 (2012)
- LinkedIn's S-1 (2011)
- Google's S-1 (2004)

The "tm" and "wordcloud" packages in R are used to generate the graphs. The "tm" package in R is a text mining library. I use it to build a corpus by reading the prospectuses from text files, and perform the standard drills to pre-process text data, such as removing numbers, punctuations and stop words. I use the "wordcloud" package in R to plot the word cloud graphs, both the stand-alone graph and the comparison graph. Special thanks to the contributors of "tm" and "wordcloud", which makes text analysis and visualization easy and fun. In the end I'm including some R code --

require(tm)
require(wordcloud)

# Build corpus and pre-process data
c = Corpus(DirSource("/home/qian/Work/IPO_S1"), readerControl=list(language="en")) # put the prospectuses in the directory; build a corpus by reading the files
c = tm_map(c, removeNumbers) # remove numbers
c = tm_map(c, removePunctuation) # remove punctuations
c = tm_map(c, removeWords, stopwords("en")) # remove stop words
c = tm_map(c, tolower) # to lower case
f = c[1] # facebook
g = c[2] # google
l = c[3] # linkedin

# plot the facebook word cloud
tdm = TermDocumentMatrix(f)
m = as.matrix(tdm)
freq = rowSums(m)
words = rownames(m)
wordcloud(words,freq,min.freq=100,col='blue')

# plot the facebook-google comparison word cloud
tdm = TermDocumentMatrix(c(f,g))
m = as.matrix(tdm)
comparison.cloud(m,max.words=100)

Thursday, January 26, 2012

Pre-commit Tests

Keeping the trunk stable matters a lot at Wealthfront. We do continuous deployment, and when the build is broken our deployment chain stops. We have a flashing light in the office and a pre-commit hook which prevents any commit until the build is fixed.

The best way to know if a commit would break the build beforehand would be to run all tests on the developers machines before committing to the central repository. Even though we're working hard to keep our build time below 5 minutes (parallelizing-junit-test-runs, less-io-for-your-java-unit-tests), we think that commiting code often is important and we can't offered to wait a few minutes to run all tests each time.

That's why we came up with a reasonable subset of tests that we run on our local machines before committing. The subset is made of global tests which, in the past, have often been the cause of broken builds:

and a dynamic list of tests which is generated by looking at which source files have been modified. Since we use the standard convention that each source file has an equivalent test file with the same name plus a suffix ('Test' for java or R, '_spec' for ruby, ...), it's easy to identify which tests might be broken by the current changes using a version control system.

The svn and git implementations are very similar - they list of all the changes by calling the version control system, and find the corresponding unit test. Individual developers (or group of developers working on the same code) can also add their favorite tests to the list by specifying a list of tests or packages to include.

Even though this technique does not prevent every bad commit, we think it is a reasonnable trade-off between the time we have to wait before committing (20-30s) and the risk of breaking the build.

Tuesday, January 17, 2012

Moneyball: Using Modern Portfolio Theory To Win Your Fantasy Sports League

Football is pretty amazing. A few short months ago, we weren't even sure if we were even going to have a season. Now, we have 49er-mania taking over the Bay Area, the possibility for a rematch of one of the greatest superbowls ever between the Giants and the Patriots, and Ray Lewis, well, he's just being Ray Lewis.

For the rest of us, the end of yet another fantasy football season begins to sink in, we're left to reflect on all triumphs and regrets of the last 5 months. Hopefully you've managed to stockpile enough bragging rights to last at least until next season. If not, there's always next year.

If you're like me, the book Moneyball by Michael Lewis (or maybe even the movie, if you're a Brad Pitt fan) presented an intriguing idea that taking an objective statistical approach to evaluating talent allows one to identify the inefficiencies in assessing their value. Basically, using the right statistics, you can find the right players who will help you win at a discount and avoid the players who will cost too much for the amount they contribute to your team's success.

Working for a finance company, it's hard not to notice the parallels that fantasy sports present to investing (Paul DePodesta of Moneyball fame also graduated from Harvard with a degree in economics, which is not a coincidence). When we talk about making moves to acquire players who are due to break out or squeezing out a hefty premium for an overachieving player, we're really just participating in a virtual market, trading players just as we'd trade stocks. Buy low and sell high is still the goal.

That being the case, we can apply many of the same techniques used by sophisticated investors to squeeze out a few basis points that could mean the difference between winning and losing. While there's lots of debate about the efficiency of the public markets, I can guarantee that your fantasy league with your college roommates is not efficient. Let's take a look at how we can use Modern Portfolio Theory to make the most strategic investments in our team to give us the best chance possible for success.

We recently put up a Slideshare presentation on how to invest in ETFs using Modern Portfolio Theory that explains some of the concepts in an investing context, but they're effectively the same when applied to our fantasy sports domain. We're well versed in understanding that every player has an expected return (i.e. the number of fantasy points they will generate over a given period), but as with investing, we commonly mischaracterize an investment in terms of the presented risk. Every investment has an expected return and an associated risk. In finance, the risk we try to assess is the probability that an investment will decrease in value. In this case, we're more concerned with the opportunity cost that we've missed out on from the value attained by another alternative. For this exercise, we can simply use mean fantasy points generated by a player as our expected return and standard deviation of points as the risk. Then, using players to compose our portfolio (a.k.a. "team"), we can determine the combination of available players that presents the best risk/return characteristics for us to be successful.

Let's look at a simple fantasy football example. We'll create a 3-player team consisting of a quarterback, a running back and a wide receiver pulling from a universe of 6 players. Calculating mean and standard deviation are pretty straightforward. (note: You can also mix in projected numbers as samples with the actual numbers if desired, this example just tries to keep it simple)

Aaron
Rodgers
(QB)
Drew
Brees
(QB)
Ray
Rice
(RB)
LeSean
McCoy
(RB)
Calvin
Johnson
(WR)
Wes
Welker
(WR)
Week135.0140.0433.1930.6126.0034.55
Week229.2533.7521.1929.7817.6413.36
Week333.8639.8822.6726.1828.8250.10
Week460.6223.9922.0717.3428.7329.36
Week530.9230.7623.0822.4522.8216.27
Mean37.9333.6824.4425.2724.8028.73
Std.Dev.12.896.734.945.494.7014.85

For our team, to get the mean we simply add the mean for our players together. For standard deviation of our team, we'll make use of this equation:

stdev(X + Y) = \sqrt{var(X) + var(Y) + 2cov(X,Y)}

To simplify the math, we'll assume that players' performances are independent of each other. We know this isn't quite true, but we'll leave this exercise as an advanced topic. When variables are independent, their covariance is 0, which allows us to simplify the equation to:

stdev(X + Y) = \sqrt{var(X) + var(Y)}

All we need to do is add the variance of each player on our team together and then take the square root.

TeamMeanStd.Dev.
Rodgers/Rice/Johnson87.1714.58
Rodgers/Rice/Welker91.1020.28
Rodgers/McCoy/Johnson88.0114.77
Rodgers/McCoy/Welker91.9320.42
Brees/Rice/Johnson82.939.58
Brees/Rice/Welker86.8517.04
Brees/McCoy/Johnson83.769.88
Brees/McCoy/Welker87.6817.21



Unsurprisingly, we find in general that high risk yields a higher return and that lower risk yields a lower return. This intuitively makes sense and more on this later on, but also take notice of something else. We find that there are combinations that demonstrate a lower risk and a higher expected return than other combinations. Obviously any individual week can vary drastically, but in aggregate over the course of a season you will be better off* with the lower risk and higher return team. Think about it. You wouldn't want more risk for the same number of expected fantasy points or conversely, less fantasy points for the same amount of risk. On a week-by-week basis, other line-ups might make more sense, but over a larger period of time, we expect more points from Rodgers/McCoy/Johnson with less risk than Brees/McCoy/Welker or Brees/Rice/Welker. It's that kind of information that we hope will give us the edge we're looking for.

*This is obviously highly dependent on the data you use. The more accurate you are with your predictions, the closer you'll be in the outcome. In the finance world, we're required to say "past performance is no guarantee of future results" in the disclaimers. We're not fortune tellers, so the same applies here as well.

As you evaluate all the combinations of players, you'll find a maximum to the number of fantasy points (a.k.a. the best possible team) at each level of risk, and together they form a curve. The finance world refers to this as the "Efficient Frontier."

Now that we have this tool available, how do we use it? Let's look at a few game formats to see where this information comes in handy.

Salary Cap Leagues


Salary Cap leagues assign each owner a pool of money to spend on players' salaries. You can add any player (sometimes even if owned by another team) as long as adding the player's salary doesn't put your team's total over its cap. At an individual level, we can use each player's mean and standard deviation to determine if the player is performing above or below expectations, and standard deviation tells us how big of a swing (either up or down) is common. For each team combination, we can filter out teams that are over our salary cap and pick the most appropriate combination for a given point in the season. To know how much risk we should take on, we'll look at Total Points Leagues.

Total Points Leagues


Total Points leagues have pretty simple scoring mechanism where each team adds up all of the points generated by its players. Team with the most points wins. In contrast to Salary Cap leagues, the most common format only allows one team to possess a player at any given time (Salary Cap leagues commonly allow a player to be owned by multiple teams). This player scarcity will limit your ability to assemble your ideal team to mostly what you are able to obtain in your initial draft.

Overall, you should approach your season with the same strategy as Target Date Funds do in the investing world. In a nutshell, Target Date Funds invest in securities that have a risk/expected return that's in ratio with the distance to a target date in the future. The farther away from the target date, the higher expected return/risk the investments will be, and as the target date gets closer, the fund will transition into lower risk investments limiting the potential downside. You should effectively do the same.

Early in the season, you can afford to take on a little bit more risk knowing that there's a good chance the player will perform well over the course of a season. Putting this team together is pretty straightforward, especially since the experts care almost exclusively about performance and a player's expected return (even if they did care about risk, they wouldn't know how to apply it to your team's unique scenario). However, as the season progresses, you'll either need to make up ground if you're behind or try to lock in points if you get ahead. If you're behind, you'll need to make some moves to acquire higher risk / higher expected return players. (Note: This is absolutely not what you want to do with your retirement account. No one wants to crater their savings right before they retire. In fantasy football, it's usually better to burn out in spectacular fashion than to fade away quietly.) Unfortunately, these high-reward players have the most demand, so it's usually difficult to pull off favorable trades.

In the case that you're trying to maintain a lead, you have many more options. As you get closer to the end of the season, you should try to lower the risk of your portfolio of players. Perhaps that means trading a high reward/risk player and a scrub for two players who lower the volatility of your lineup. At the very least, you can stick to playing favorable player match-ups and avoiding injury-prone guys to make sure your team consistently chugs along to the finish line.

Head-to-Head Leagues


Head-to-Head leagues match your team up against another team each week. The team scoring the most points receives a win, and the more wins you have, the better your place in the standings. As we already know from Moneyball, the number of wins a team gets is highly correlated with scoring a higher ratio of runs than its opponents. For us, it's the same formula, but using the fantasy points scored by our team. Let's take a look at the Pythagorean Expectation formula invented by Bill James.

wins=\frac{1}{1+(runs\ allowed/runs\ scored)}

In fantasy sports, although you can argue that there is some level of correlation between players, one notable difference is that your score and your opponent's score are mostly independent. This means that when you look at the Pythagorean Expectation formula, your points scored will be consistent with your average projected out for the season, but it also means that your points allowed should approach the league average as the season goes on. It's likely that you will lose weeks where your opponents get lucky and score above average amounts of points. Don't panic. Over the course of the season, your opponents will hopefully revert to the mean and you'll get a few favorable breaks yourself.

Some Head-to-Head leagues will declare a winner at the end of the regular season, but it's more common to have a playoff among the best regular season teams to determine the winner. Hopefully by the time the playoffs get closer, you'll have locked up your spot and you have the flexibility to retune your roster to have the best chances against the other teams in the playoffs. You use the same techniques as you'd use in the Total Points leagues to lock in the best possible seeding for your playoff tournament, but remember to keep enough horsepower on your roster to compete against the other playoff teams. If you lock up your playoff spot and you're not playing for other prop bets like highest points scored during a season, you might consider losing games to inferior teams that might make the playoffs over unlucky teams with higher variance or higher expected returns than your team. The last thing you need is for a high variance team to sneak into the playoffs and get hot in the last weeks. The playoffs are mostly about luck and the less chances you take playing with fire, the more likely you'll be holding that championship trophy in the end. In the immortal words of Ricky Bobby, if you ain't first, you're last.

Now, let's take a look at the results from my league this past season. First, this chart shows the mean and standard deviation of the weekly points generated by each team during the season. The league average for mean and standard deviation of scores is in red. As you might suspect, the numbers next to each represent which place the team finished in after the playoffs. The numbers in this charts are completely intuitive. Teams in the upper left that demonstrated above average points and lower than average risk performed the best. Teams that scored less than average points did not fare as well.



Behind all data is a story. In this case, the 4th place team scored slightly above average, but also showed significant volatility. This team absolutely crushed the #6 team in the first round of the playoffs, but was subsequently crushed twice in a row by the #3 and #2 teams due to team 4's huge swings in point production, eventually settling in 4th place. You'll also notice how the eventual 7th place team appeared among the top teams in the league. As it turns out, this team was in fact one of the better teams in the league, but unfortunately very unlucky.

This second chart shows the actual winning percentage of teams charted against their expected winning percentage calculated using Bill James' Pythagorean Expectation formula:



As you can see, the line of best fit matches pretty well and means that using the Pythagorean Expectation formula has some merits. Teams that scored more points compared to their opponents also continued to perform well in the playoffs. What you'll also notice is that our #7 team is well below the line here, despite scoring an above-average amount of points in the previous chart. This likely means that the team was scoring enough points that they should have had a better win-loss record, but probably lost several close matches by a small amount and won a few matches by a landslide. This was, in fact, the case. Even more to the point, Team 7 lost the last regular season week to the eventual 3rd place team by .87 points (remember the mean was about 141, so that's a very small amount). Had Team 7 scored just a single additional point that week, it would have made the playoffs in the place of the eventual 3rd place team and taken 3rd place. Instead, Team 7 won all of the consolation games and finished on a 2-game win streak, but could do no better than 7th place. It just goes to show that luck and great timing mean as much to winning as having a good team.

So, we've covered a number of ways that concepts in finance are completely analogous and applicable to fantasy sports. In this case, we've explored fantasy football, but the concepts translate to other sports as well. Remember that it's important to keep in mind the risk characteristics of your players and team as well as their performance, just like you should for your investments. Good luck!

By the way, here's a list of some of my favorite sites doing some really interesting advanced statistics work:

Football
Baseball
Basketball