Link to wealthfront.com

Fork me on GitHub

Monday, October 15, 2012

Developing a Portfolio Backtester in R


As part of building our new Tax-Loss Harvesting (TLH) feature, we needed to build a portfolio backtester to simulate portfolio performance over a historical period. R was the obvious choice for its strength in statistics and finance; the R Finance community in particular has created a number of very useful packages, some of which we'll talk about below.

A quick summary of TLH: If you hold an ETF traded at paper loss, we sell it to realize the capital loss, which can be used to offset capital gains and income and thus lower your overall tax bills. In the meantime we buy another similar ETF to maintain the desired asset class exposure. We will use TLH as an example to explain the important steps in building a backtester in R.

R package development

We practice many of Hadley Wickham's suggestions for R package development. He built a package called devtools, which makes R package development less painful. devtools simulates the process of installing a half-baked package so that you can add functions and test them incrementally in a more efficient and TDD way. roxygen2 is used to automate documentation tasks (although sometimes you still need to manually update some metadata to make the package work), and testthat (also by Hadley Wickham) is used for unit testing. Please refer to Hadley's devtools wiki page for more details. 

Reference Classes

R wasn't born in the OO world and its OO support has been weak, which makes it challenging to develop complex systems in R for software developers used to mainstream languages. So far R has three OO options: S3, S4 and Reference Classes, where you can make things that somewhat look like objects. Reference Classes are R's newest attempt to attack the OO problem and looks much more similar to the OO concept in Java than S3 and S4. Using Reference Classes, you can finally make mutable objects in R. The syntax is straightforward. Here is an example of creating a Trade class. The Trade class has four fields and two functions.
Trade <- setRefClass('Trade', 
  fields = c('symbol', 'date', 'price', 'quantity'),
  methods = list(
    getTotalAmount = function() {price*abs(quantity)}, 
    getString = function() {paste(symbol, date, price, quantity)}))
Here is how to create a Trade object.
trade <- Trade$new(symbol='SPY', date=Sys.Date(), price=100, quantity=5)
Again, please refer to Hadley's Reference Classes wiki page for more details or run help(ReferenceClasses) in R.


Modules

We abstract the complex backtesting system into four major modules implemented by reference classes. They are Portfolio, Market, Strategy and Backtester. In this section we explain the why, what, and how of building the four classes.

Portfolio
Portfolio keeps track of trades, positions, lots, balances, cash flows and capital gains/losses in a portfolio. The portfolio's status is the result of applying a trading strategy and market movement. You want to keep track of all historical information about the portfolio such that you can understand the strategy's behavior under different market scenarios. Under the hood we wrap a package called blotter to help manage trades, positions and balances. For the TLH problem, we implement lot-level data management and sophisticated analytics in the Portfolio class (blotter doesn't support these functions).
portfolio <- Portfolio$new()
portfolio$getPositions(symbols, date)
portfolio$getLots(symbols, date)
portfolio$getCash(date)
portfolio$getBalances(date)
portfolio$executeTrade(trade)
Note that a portfolio module is not always needed in backtesting. For example, you want to simulate a simple momentum strategy. Most likely you don't need much portfolio bookkeeping. Every month you sort stocks based on their recent performance, buy the best-performing decile and calculate these stocks' weighted return for the month. Just compound the monthly return series of the portfolio to get cumulative return (here we implicitly assume monthly rebalancing). However, if you want to model a more realistic portfolio setting, such as trade commissions, slippage, cash flow and even lot-level nuances, you will need to build a portfolio module for bookkeeping.

Market
Market abstracts all the environmental information, i.e. data and parameters external to a portfolio, including securities prices, index data, ETF parameters (expense ratio, tracking error and spread), commission structure, and so on. We also use the Market class to calculate derived market data such as rolling volatility metrics of asset classes. We query a Market object for any market information at any time point. Any data analyst will appreciate the importance and messiness of data management. Data sourcing, cleaning, formatting and transformation could cost a significant amount of money and time before you even get to the core of a problem. For simplicity, we use the encompassing Market class to insulate the data management complexity.
market <- Market$new(...)
market$getPrice(symbols, date)
market$estimateTxnCost(quantities)
market$estimateExecutionPrice(symbols, date, quantities, spread)


Strategy
Strategy is an algorithm that makes trade decisions and generates trades based on the portfolio's status and the market. Note that you can only use the current and previous market data, but not the future market data (this would create look-ahead bias). For TLH, we implement threshold-based trading decision making algorithm and wash sale management in the Strategy class. Like I said, the strategy's complexity depends on your specific problem. Your strategy might only need market information to make trade decisions, or need market information and portfolio positions to make trade decisions. Our TLH strategy needs market information and portfolio lots to make trade decisions. A strategy module is probably the crown jewel of a backtesting system, and you will have a lot of fun experimenting with cool stuff.
strategy <- Strategy$new()
strategy$generateTrades(date, portfolio, market)
strategy$assetAllocationTrades(date, portfolio, market)
strategy$tlhTrades(date, portfolio, market)
strategy$tlhRecoverTrades(date, portfolio, market)


Backtester
Backtester is more like a runner class that puts pieces together and let the simulation run over a time period. It also calculates portfolio analytics and evaluates strategy performance at the end of the simulation. You can write some charting functions to show portfolio cumulative return in the Backtester class too.
backtester <- Backtester$new(market=market, portfolio=portfolio, strategy=strategy)
dates <- list(....)  # provide dates
backtester$run(dates)
backtester$evaluate()
Here is what the run function looks like:
run = function(dates) {
  for (d in dates) {
    trades = strategy$generateTrades(d, portfolio, market)
    if (!is.null(trades)) {
      for (t in trades) {
        portfolio$executeTrade(t)
      }
    }
  }
  portfolio$updateBalance(d, market)
}


Useful R packages for development in finance

The R developer community focusing on financial applications is not super large, but they have built quite a few convenient packages. Here are some of the packages we use to build our backtester --
  • xts (Extensive Time Series): Time series is probably the most important data structure in finance. Most of the data one needs to deal with in developing financial applications are time series. For examples, securities prices, trades, positions, portfolio returns and economic indicators can all be represented as time series. They can be indexed on timestamps, dates, weeks, months or years. xts provides a convenient tool to mange time series data -- create time series, look up specific sub series, merge two time series on common time indexes, create lag/lead time windows, etc. Managing time series is such a foundational task in developing financial applications that you want to be fluent at a handy tool such as xts.
  • quantmod (Quantitative Financial Modeling Framework): The package provides a suite of quantitative tools for developing and analyzing trading strategies. Using quantmod, you can download data from Yahoo Finance, Google Finance and FRED databases, calculate various types of signals and chart multiple time series. You can also calculate monthly or annual return series from securities prices or portfolio balances.
  • blotter: The package is essentially a portfolio accounting module. It does book-keeping for trades,  positions and balances of a portfolio as you make trades. It keeps the portfolio's full history and allows you to diagnose its status at any time. It allows you to specify trade commissions and slippage, which represents the real world more realistically.

If you are ever interested backtesting an investment strategy in R, hopefully this blog post serves as a useful starting point. Have fun!