Link to wealthfront.com

Fork me on GitHub

Tuesday, May 19, 2015

Performant CSS Animations

Web performance can be split into two relatively distinct categories: first page load and subsequent interactions. We can improve the first load by decreasing the server response time, and optimizing the loading of CSS and Javascript. Once the website is loaded, there is a completely new set of performance related challenges: using a Javascript framework with good inherent speed, immediately responding via the UI to input events, and building animations that feel snappy.

There are many posts online about all of these topics; however, the challenge is knowing what to look for. The goal of this post is to provide a basic level of understanding focusing on animation performance, explain why all animations are not created equal, and provide some other performance related resources.

A Simple jQuery Example

This is the canonical jQuery animation example. We take an element with a class selector and move it down 100px over 2 seconds. While jQuery makes it really easy to do this kind of animation, this actually has really bad performance implications. Let's dig into what is actually happening. The above code is getting turned into something similar to this:

There are two major issues with this approach.

  • vSync
  • Animating Layout Properties

vSync

Notice the setTimeout in the example above. On a 60HZ monitor, we have 60 frames per second that we can update our animation. Our goal is to have a setTimeout callback fire each frame. There are many cases where this doesn't actually happen. Among them, setTimeout and setInterval aren't perfect and thus won't fire exactly when we want it to, once per frame. Instead, you may end up with two updates on a single frame or a frame with no updates at all.

Thankfully, modern browsers provide a function window.requestAnimationFrame that takes a callback which the browser guarantees will be run on the next frame. You no longer need to try to calculate a setTimeout.

More Reading: http://www.html5rocks.com/en/tutorials/speed/rendering/

Animating Layout Properties

There is another reason why we might drop a frame. When an animation is happening, the browser executes your script on every frame, figures out what it needs to paint on the screen, and then paints it. Since this is likely happening about 60 times a second, the browser ends up with a little over 16ms to complete all of that. If a frame ever takes longer than the allotted time, the next frame is dropped so it can continue executing the previous frame.


From Google IO 2013

It becomes important to know what the browser is doing on every frame in order to optimize our animations and get rid of dropped frames. On every frame the browser has a set of steps it takes, which I'll walk through below.

The Browser Frame Waterfall


From http://www.html5rocks.com/en/tutorials/speed/high-performance-animations/

Recalculate Style

The browser first runs your code and checks if any of the classes, styles, or attributes on elements have changed. If they have, it recalculates the styles for elements that might have changed, including inherited values.

Layout

The browser then calculates the size and position of every element on the screen, abiding by the rules of floats, absolute positioning, flexbox, etc.

Paint

The browser then draws all of the elements with their styles onto layers. Think of browser layers like Photoshop layers: collections of elements.

Composite

Once the browser has all of the elements drawn onto layers, it composites the layers onto the screen. This is just like rasterizing layers in Photoshop; it moves the layers around and draws them all into one image.


On every frame, the browser decides what has to be done. If a CSS rule changes the size of an element such as width, margin, or top, the browser will do a layout, paint, and composite.

If a CSS rule changes such as border-color, background-position, or z-index, the browser doesn't need to recalculate the layout, but it needs to repaint and composite the layers together.

There are a few rules that don't require a layout or a paint and the browser can do with just a composite. transform and opacity are the most common ones. Just like in Photoshop, moving layers around is very fast. Photoshop doesn't need to think about what is on the layer; it just moves the entire layer. The browser can do the same thing. Even better, while calculating element sizes for a layout happens on the CPU, things that only require a composite can happen purely on the GPU which is much faster.

http://csstriggers.com is a great resource to learn more about which CSS properties cause a layout, paint, and composite.

More Reading: http://www.html5rocks.com/en/tutorials/speed/high-performance-animations/#toc-dom-to-pixels

What Can We Animate

The key to fast and smooth animations is to only animate things that require layer compositing which is done on the GPU. opacity, and transform: translate, scale, and rotate are our toolkit to work with.

Hover over the boxes to see the effect

See the Pen mywqNB by Eli White (@TheSavior) on CodePen.

See the Pen KwqyOJ by Eli White (@TheSavior) on CodePen.

See the Pen gbRoQQ by Eli White (@TheSavior) on CodePen.

See the Pen rawpoW by Eli White (@TheSavior) on CodePen.

Putting these examples together to do more complicated animations is an exercise in thinking outside the box. Here are some examples.

Animate background-color?

See the Pen YPQEbZ by Eli White (@TheSavior) on CodePen.

The effect is simple. We have two identical overlapping boxes in the same place, one red and one teal. The teal starts at opacity: 0 and on hover of the containing box we transition the teal to opacity: 1.

Alert message

See the Pen zxzPVY by Eli White (@TheSavior) on CodePen.

In order to have a simple message slide in on a div, we position: absolute the alert box and translate it -50px up. Since the parent div has overflow: hidden, we don't see it. On hover of the parent div, we slide the alert box back down to 0.

Flipping Div

See the Pen zxzRvd by Eli White (@TheSavior) on CodePen.

While a little less useful, this is a good example of thinking outside (or behind) the box. While most people would think animations like this would be inefficient, knowing what the browser is doing and clever ways to put things together can actually open doors of possibilities. Check out the codepen for the CSS.

Revisiting the setTimeout Example

We now know the above code is bad!

The setTimeout example that we dissected at the start of this post can be easily modified to take advantage of the knowledge we know now.

As we pointed out at the beginning, the big issues are setting top which is a CSS property which causes a layout to happen and thus also a paint and composite. We will change that to a translate.

The other issue was that using setTimeout doesn't guarantee our animation to be run on every frame. We will change that to requestAnimationFrame. requestAnimationFrame also gives us as a parameter the time since the app started. By changing around our math a little bit to be based on time instead of distance for step and current, our animation will look smooth even if we still miss a frame.

We end up with our animation performance optimized code.

Conclusion

There are three key takeaways to building performant web animations.

  • Know how the browser executes your animations
  • Prefer to animate things that only require compositing while avoiding causing a layout
  • Use requestAnimationFrame over setTimeout or setInterval

In my experience, the hardest part of building a site that has well written animations is actually in the knowledge and communication between designers and engineers. Designers need to understand what palette and toolkit we have to work with to be able to come up with interactions and metaphors that can feel smooth to the user. Engineers need to know how to implement specific ideas and debug what the browser is doing when running their animations.

At the end of the day, all that matters is the experience the client has with the application. If we build things that are slow and jittery, that satisfaction will go down. It is on all of us to continue improving, continue educating ourselves, and continue building awesome experiences.

More Reading

Thursday, April 2, 2015

Similarities of Disruptive Companies

At the end of our Wealthfront Onboarding program, we ask new team members what about Wealthfront most differs from their prior expectations. The most common answer is “Wealthfront is not a finance company.”

This answer is unsurprising in retrospect, as it reflects a common yet fundamental misunderstanding of disruptive companies: many people believe startups win by directly competing against their industry incumbents. For example, they think Uber wins by providing better taxis than Yellow Cab, Airbnb wins by providing better hotels than Hilton, and Wealthfront wins by providing better financial advisors than Charles Schwab. This belief is wrong.

A common corollary of misunderstanding disruptive companies is fearing that joining them will limit future job opportunities. Again, disruptive companies repeatedly disprove this corollary. For example, Uber and Airbnb engineers do not worry their only future job prospects are in transportation or hospitality companies.

Disruptive companies often hold the greatest promise for generalist software engineers to build their career, because Internet-based disruptive companies are most successful when they are driven by technology.

Ingredients of Disruptive Innovation


Nearly all successful Internet-based businesses use convenience, simplicity, and cost to disrupt and reinvent their markets—the three common ingredients of disruptive companies identified by Disruption Theory, popularized by Clay Christensen. They disrupt their markets through the unparalleled benefits of software and the Internet, rather than trying to beat incumbents at their own game.

Uber and Airbnb are two recent successful startups that exemplify this principle:
  • While they provide route selection and accommodation scheduling, nobody thinks Uber owns taxis or Airbnb owns rental properties
  • While they implement specialized mathematics and algorithms for pathfinding and scheduling, the vast majority of Uber and Airbnb engineers are generalists with no prior background in transportation or hospitality industries
Wealthfront is no more like a finance company than Uber is like Yellow Cab or Airbnb is like Hilton. This explains why we repeatedly talk about being a technology company, why nearly all of our technical challenges are unrelated to investing, and why we avoid hiring engineers from the finance industry.

Common Disruptive Technical Challenges


Disruptive companies face remarkably similar technical challenges, despite delivering value across completely different industries. These technical challenges are far different than those faced by their legacy competitors. Here are some examples of the most difficult of these challenges:
  • Automating complex human workflows into horizontally scalable software platforms
  • Building data-driven features derived from complex real-world models
  • Scaling analytics and machine learning as data volume grows exponentially
  • Applying devops to scale operations and infrastructure entirely via code
  • Scaling test automation and infrastructure to ensure high-quality code across the stack
  • Ensuring security and preventing fraud despite rapidly increasing public visibility
  • Acquiring clients online via experimentation, funnel optimization, and quantitative marketing
None of these problems are specific to the finance, transportation, or hospitality industries. On the contrary, they are specific to disruptive companies built for the Internet using modern technology stacks.

Why Generalist Teams Win


Disruptive companies prefer to hire talented generalist engineers, while deliberately not hiring specialists from their legacy competitors. This preference is motivated by four factors:
  • Industry veterans are burdened by obsolescence bias, assuming constraints outdated by mobile and the Internet
  • Legacy technology experience is not required, and instead often gets in the way
  • Specialized expertise rapidly becomes obsolete and thus has a short shelf life
  • Generalists understand how to leverage new platforms and open source technology
These preferences explain why Wealthfront hires top quality generalist engineers and avoids specialists from the finance industry.

Your Market Value & Disruptive Companies


The single most important determinant of your future market value as an engineer is the culture and caliber of engineering teams in which you work. For example, Google and Facebook engineers are in high demand by many companies because their great engineering cultures have attracted many highly talented engineers.

Great engineers want to build software that impacts the lives of many, not focus on a narrow niche. They are best able to do so at disruptive companies, independent of the industry in which they participate. The market value of everyone at a disruptive company grows as a critical mass of great engineers come together.

The next time you consider joining a company, ask yourself whether it is disruptive, has a great engineering culture, and has attracted high caliber engineers. If the answer to all three is yes, then joining that company is likely the fastest path to maximizing both your current and future market value.

Wednesday, February 11, 2015

Pattern matching in Java with the Visitor pattern

I once read an essay—I cannot find it now—that talked about how learning a more advanced programming language can improve your coding even in a language that lacks those features because those language features are really ways of thinking, and there's no reason you cannot think in terms of more powerful abstractions just because you need to express them in a more limited language. The phrase that stuck with me was that you were "writing into" that language, rather than writing in the language as it's meant to be used. At Wealthfront, while the majority of our backend code is Java, we use a variety of methods that originate in functional languages. We've written before about our Option. This article is about pattern matching in Java.

I'm going to take a digression into explaining what pattern matching is and why it's so fantastic. If you like, you can also skip ahead to the actual Java examples

Inspiration from post-Java languages

Pattern matching is a feature common in modern functional languages that allows a structure similar to switch but on the type of the argument. For example, let's take a base class that might have two subclasses, and we want to write logic that handles the two subclasses differently. An example might be a payment record that varies in type according to the method of payment (e.g. Bitcoin vs. credit card). Or maybe an owner that varies depending on whether it represents a single user or a group. This is useful for any class hierarchy representing some sort of data that has a base set of fields and subclasses that may have other data.

The visitor pattern was around before Haskell and Scala wowed everyone with pattern matching, but seeing pattern matching makes it easier to see why it's useful.

Scala pattern matching

Scala supports a match operator that concisely expresses the idea of switching on the type of the object.

What does this do? First, we have an abstract class Foo with two subtypes Bar and Baz. These have different parameters. Bar stores an Int, Baz a String. Then we have a handle method that uses match to extract the appropriate field from either of these. Each matched case could have whatever logic you may need, and the type of b in both of these is the specific subclass, not just Foo.

Swift enums

Swift offers this same functionality through enums, which behave more like types than values.

The syntax differs, but it works out to the same. The let keyword is a helpful reminder of what's going on here -- a new variable is created holding the same value that was in f but now it's of the specific type.

Simpler java solutions

instanceof

One simple solution that comes to mind is to use instanceof.

There are a few problems with this. 1. because Hibernate wraps the class in a proxy object, and then intercepts method calls to make sure the right code is called, objects loaded from Hibernate will never be instances of the derived type, and this will not work. 2. Correctness is not enforced by the compiler. It's perfectly valid Java to say if (f instanceof Bar) Baz b = (Baz) f;, but it will fail every time. 3. Lastly, there is no way to ensure completeness. There's nothing I can do to the existing code to make sure that it gets updated when someone adds a new subtype Qux.

Moving logic into the class

Another solution is to embed this logic in the class, like OOP says we should.

This works fine when it's one method, or a few, but what happens when it grows to dozens? It means that your data objects start to be the location of all your business logic. If that's your style, that's okay and you have a lot of company, but I find it difficult to think about things this way. For example, if I want to verify the identity of owners of individual, joint, and custodial accounts, I could put a "verify identity" method in the AccountOwner type, but I'd prefer to create a single IdentityVerifier class that encapsulates all the business logic about verifying identity. The visitor pattern fits in a model where data objects are simple and business logic in primarily implemented in various processor or "noun-verber" classes.

Another issue with business logic in the data class is that it makes it harder to mock for testing. With a processor interface, it's easier to mock it and return whatever data you want. With business logic in the class, you need to either set up the class so that your data actually satisfies all those rules, or you need to override the methods to return what you want. It makes it harder than it should be to write a test saying something like "accounts whose owners can be identified may be opened immediately".

The visitor pattern in Java

Basic Visitor

The basic visitor pattern in java consists of the following:

  • An abstract base class with an abstract method match or visit taking a parameterized Visitor.
  • A parameterized Visitor class with a case method for each subclass.
  • Subclasses of the base class that each call the appropriate method of the visitor.
  • Application code that creates an anonymous instance of the visitor implementing whatever behavior is desired for that case.

Default visitor

It's sometimes useful to have special logic for some of the subclasses, and a default value for others. This can make the code more readable because it removes boilerplate which isn't part of what the code is trying to accomplish. You can do this with an implementation of the interface that supplies a default value for anything not overridden.

The downside of this pattern is that updating default visitors can be overlooked when a new case is added. One way to handle this in practice is while adding the new case, make the default visitor abstract without implementing the new case, review all the code that breaks, and once satisfied that the behavior is correct, adding in the default implementation for the new case.

Void or Unit return values

We generally define our visitors as being parameterized by a return type, but sometimes no return value is needed. At Wealthfront we have a Unit type with a singleton Unit.unit value to represent a return value that isn't meaningful, but java.lang.Void is also used.

I've used Void in this example to avoid intordu I feel compelled to link to a discussion of why this is not ideal from a functional perspective: void vs. unit.

Destructuring pattern matching

These make up all that you likely need and probably 90% of our use of the visitor pattern, but there's one more item that is worth mentioning. My Scala example above doesn't actually show the full power of pattern matching, because I'm just matching on the type. With case classes, or with a custom unapply method, I can actually match not just on the types of the objects, but on details of their internal structure. For example, using types similar to what I used before, here's a version that treats anything above 10 as "many".

Since this is a language feature in Scala, it's flexible and easy to use. You can simulate the same behavior in Java, but you need to encode the cases that are allowed into the visitor itself.

In some sense, 16 vs. 58 lines is a big difference, but you could also argue that 42 lines of additional boilerplate to simulate this powerful functionality in Java is worth it. This destructuring pattern matching is most useful for value types. That is, objects that just represent collections of data but don't have any other identity attached to them. For entity types, that represent something that is defined as itself regardless of what values it currently has, it's better to use the basic pattern matching.

Is this really pattern matching, and how useful is it?

Some might object that this isn't "really" pattern matching, and I would agree. Pattern matching is a language level feature that allows you to operate on subclasses in a type-safe way (among other things). The type-safe visitor pattern allows you to operate on subclasses in a type-safe way even without language support for pattern matching.

As to its utility, I can say that we use it extensively at Wealthfront, and once people become familiar with it, it's great. Pretty much every polymorphic entity will have a visitor, which makes it safe for us to add new subtypes, since the compiler will let us find all the places we need to make sure it's handled. Visitors on value types, especially destructuring visitors, are much less common. We use it in a few places for things like Result objects that represent a possible result or error.

Give it a try the next time you run into a ClassCastException in your code.

Friday, January 9, 2015

WF-CRAN: R Package Management at Wealthfront

R is a powerful statistical programming language with a large range of built-in capabilities that are further augmented by an extensive set of third-party packages available through the Comprehensive R Archive Network (CRAN). At Wealthfront, R is an important tool with many use cases across the company. These include analyzing our business metrics to enable our data-driven company culture and conducting research for our investment methodology, including our recently announced Tax-Optimized Direct Indexing.

Limitations Of CRAN

To support the multiple use cases of R at Wealthfront, we rely on a large number of packages, both third-party packages available through CRAN and packages we have developed internally. The wide availability of existing packages and the ease of creating new ones enables powerful and sophisticated research and analysis, but the management of these packages can be difficult.

For third party packages, CRAN makes it easy to quickly obtain and install new packages, but parts of its design are problematic when used in business-critical settings. At Wealthfront, we faced the following challenges when using CRAN.

Archiving Of Packages And Lack Of Versioning Makes Reproducibility Difficult

On any day, new packages may be added to CRAN, old packages archived, and existing packages updated. Although individual packages themselves are versioned, only the latest package versions are available for download and installation using the built-in install.packages function. By default install.packages downloads the latest package and overwrites any existing older version of the package. The library function then loads the version that was last downloaded and installed.

The ephemeral state of CRAN poses multiple problems. First, the lack of explicit versioning when installing and loading packages makes reproducing analyses and research difficult. The exact results obtained may depend on the specific package versions used and these may vary from machine to machine. Changes to packages may also not be backward compatible, often breaking existing tools and scripts. Further, packages that have been archived on CRAN can no longer be installed using install.packages and must instead be directly downloaded and then installed from the source tarball.

Inconsistent Installation And Dependency Management For Proprietary Vs. Third-Party Packages

In addition to the third-party packages we use at Wealthfront, we also have a large number of internally developed packages we depend on. These packages are proprietary and so cannot be uploaded to CRAN. Having some packages from CRAN and others from private repositories results in inconsistent installation processes for both sets of packages. For third-party packages from CRAN, built-in R functions such as install.packages can be used, but for proprietary packages either built-in R tools such as R CMD INSTALL or methods such as install_git and install_local from the devtools package must be used. Further, R CMD INSTALL and devtools do not handle dependencies between proprietary packages, so the packages must be manually installed in an order that satisfies the dependency chain.

Rather than have packages in two separate locations and different installation and dependency management procedures, we want all packages to be accessible from a single location with consistent steps for installing packages and managing dependencies.

Inability To Review And Document Use Of Third-Party Packages

While there are a large number of packages available on CRAN, the quality of these packages can vary greatly. The quality of any research or analysis is dependent on the packages used and so we must ensure that the packages we rely on are well-written, maintained, and well tested. Thus, for any analysis or research that is production or business critical, we must ensure that we have vetted the packages being used.

WF-CRAN

We developed an internal CRAN with these limitations in mind. WF-CRAN holds the third-party packages we depend on along with our internally-developed packages. The following describes our design choices for implementing WF-CRAN and how these choices addressed the challenges we faced with CRAN.

Repository Versioning

There has been much community discussion and debate about the lack of CRAN package versioning (see this discussion about having a release and development distribution of CRAN and this R Journal article discussing options for introducing version management to CRAN). With WF-CRAN, we took a versioned repository approach. Each time a package is added or modified in WF-CRAN, a new version of the repository is created that includes the latest version of the packages contained in the repository. With this approach, we can continue to use the existing R functions for managing and loading packages, including install.packages and update.packages, by explicitly specifying the repository version using the repos argument. For example, the package xts from version 1.81 of WF-CRAN can be installed using:

install.packages(xts, repos = "http://wf-cran/1.81")

Further, dependencies of packages, both third-party and proprietary, can be automatically installed by specifying the dependencies argument for install.packages.

CRAN Nexus And Maven Plugins

Rather than write much of the server software from scratch, we chose to implement WF-CRAN on top of Sonatype Nexus by writing a CRAN Nexus plugin. Together with an internally developed CRAN Maven plugin, new packages and package versions can easily be deployed by navigating to the directory containing the package and running the command mvn deploy. Maven and our Maven CRAN plugin are then responsible for running the R CMD build command to create the package source archive and uploading this to WF-CRAN. The Nexus plugin then creates the repository structure and PACKAGES file for the new versioned snapshot of the repository by parsing the package DESCRIPTION files and creating symlinks to the latest versions of each package. We currently only upload the package source to WF-CRAN and so all packages must be installed from source. This is made the default for install.packages by setting

options(pkgType = "source")

in each user’s .Rprofile.

Source Control Of Packages

The source for the packages themselves are kept in Git repositories. We have multiple repositories containing R packages, with each repository containing packages that support a set of related use cases, such as research or business analytics. We also have a repository that holds the source code for all of the third-party packages we use. This allows us to document which packages we depend on and to explicitly approve the use of new third-party packages by code reviewing the addition of the packages to the third-party repository. Although packages are contained in multiple repositories, once the packages are deployed to WF-CRAN, they can all be installed and loaded using the built-in R functions such as install.packages and library.

With this setup, the workflow for adding or modifying a package in WF-CRAN is:

  1. Create a branch in the Git repo containing the package
  2. Have the change code reviewed
  3. Merge to master and run mvn deploy

WF-Checkpoint

One of the main motivations for WF-CRAN was enabling verifiably reproducible research and analysis. The versioned repository structure of WF-CRAN enables this by allowing us to explicitly specify the version of the repository used in a script. We do this using an approach similar to that of Revolution Analytics’ Reproducible R Toolkit and checkpoint package. Revolution Analytics’ approach involves taking a daily snapshot of CRAN and making these snapshots available through their checkpoint-server. In a script, the checkpoint function is used to specify which daily snapshot of CRAN to use.

library(checkpoint)
checkpoint("2015-01-01")

The checkpoint package then analyzes the script to find all of the package dependencies and installs these to a local directory using the versions from the specified snapshot date. The script is then run using these packages.

At Wealthfront, we developed a similar package, WFCheckpoint, that takes a version number rather than a snapshot date as an argument and uses the packages for the specified repository version to run the script. For example, to run a script using packages from version 1.81 of WF-CRAN, the following can be added to the top of the script:

library(WFCheckpoint)
checkpoint("1.81")

The WFCheckpoint package, together with WF-CRAN, thus allows us to easily reproduce the results of any research or analysis.

How This Has Helped Us

As a data-driven and research-intensive company, our culture and success is built on producing high quality and reproducible research and analysis. WF-CRAN and WFCheckpoint have allowed us to bring our rigorous engineering practices to R, enabling more scalable backtests, sophisticated and interactive dashboards, and more unified development environment for research.

At Wealthfront, we strive to constantly improve the investment products and services we provide to our clients and WF-CRAN and WFCheckpoint are just a few examples of the tools we have developed at Wealthfront that enable us to do so.

Thursday, October 16, 2014

Security Notice on POODLE / CVE-2014-3513 / CVE-2014-3567

On October 14 a vulnerability in OpenSSL, named POODLE, was announced by Google. Two advisories, CVE-2014-3513 and CVE-2014-3567, describe a vulnerability in OpenSSL implementation of the SSLv3 protocol and another vulnerability that allows a MITM attacker to force protocol downgrade from secure TLS to vulnerable SSLv3.

In response to the POODLE vulnerability, Wealthfront disabled SSLv3 access our websites. For clients using SSLv3 to access our websites, we instead provide links to upgrade their browser.

Further Resources for POODLE Help

We recommend auditing all systems using OpenSSL and upgrading when vendor fixes are available. Here are some resources we found useful in our response to this disclosure:

As always, if you have any questions about the security of your Wealthfront account, contact us at support@wealthfront.com. We will continue to monitor this issue as the community and vendors investigate this vulnerability further.

Thursday, October 2, 2014

Touch ID for Wealthfront App

At Wealthfront, our clients count on us to provide them with delightful financial services built with leading technology. They have chosen to trust us with some of their most important financial needs and keeping their money and data secure is of the utmost importance to us. When we released the Wealthfront iOS App in February we required our clients to login to the app if it had been inactive for more than 15 minutes, causing many of them to enter their full password multiple times each day. We soon pushed out our PIN unlock feature to allow them to view their data in the app with a four digit PIN. When a client needs "privileged" access, for example scheduling a deposit, the app still requires their password. This way we can ensure security around sensitive events while providing greater convenience for everyday use.

As you can see from the graph below, within a week after our PIN unlock feature went live, more than 75% of clients were actively using it. To this day it remains a highly utilized feature.



One of the exciting new features Apple announced at this year's WWDC allows developers to use biometric-based authentication, or Touch ID, right within their apps. Touch ID is very secure because fingerprints are saved on the device in a security co-processor called the Secure Enclave. It handles all Touch ID operations and, in addition, it handles cryptographic operations used for keychain access. The Secure Enclave guarantees data integrity even if the kernel is compromised. For this reason, all device secrets and passcode secrets are stored in the Secure Enclave.

Touch ID makes authenticating with an application even easier than our PIN feature while providing additional layers of security. From the day it was announced we've wanted to use Touch ID to allow our clients to authenticate with the Wealthfront app. Apple provides two mechanisms for us to integrate Touch ID:
  1. Use Touch ID to access credentials stored in the keychain
  2. Use Touch ID to authenticate with the app directly (called Local Authentication)
We carefully built test apps to compare each of these two approaches and today we'll examine our thought process and how we chose which mechanism best suited our needs.

Decide which Touch ID mechanism to use

The following is a diagram adapted from WWDC 2014 Session 711 to compare the two authentication mechanisms:



The biggest differences between keychain access and local authentication are:
  • Keychain access
    • The Keychain is protected with the user's passcode, it is also protected with a unique secret built into each device only known to that device. If the keychain is removed from the device, it is not readable
    • The Keychain can be used to store a user’s full credentials (e.g. email and password) on device encrypted by the Secure Enclave that can be unlocked based on authentication policy evaluation:
      • If a device does not have a passcode set, the Secure Enclave is locked and there is no way to access any information stored in it
      • If a device has a passcode, the Secure Enclave can be unlocked by the passcode only
      • If a device has Touch ID as well, the preferred method is to authenticate with Touch ID and passcode is the backup mechanism
      • No other fallback mechanism is permitted and Apple does not allow customization of the fallback user interface
  • LocalAuthentication
    • Any application can directly call LocalAuthentication for Touch ID verification
    • No permission is granted to store secrets into or retrieve secrets from the Secure Enclave
    • Contrary to the keychain access case, Apple does not allow device passcode authentication as a backup
    • Every application needs to provide its own fallback to handle failed Touch ID case with custom UI
We had one major concern about storing sensitive information in the keychain, the only fallback for failing to authenticate with Touch ID is the device passcode. iOS users usually configure a four digit passcode, which we feel is less secure than their account password. Apple, for example, uses your iCloud account password as the fallback mechanism if you are trying to make a purchase on the iTunes store and fail to successfully authenticate with Touch ID. If we authenticate with Touch ID via LocalAuthentication, we can use our PIN unlock feature or the client's password as the fallback mechanism. We still don't store the password on the device, failure to authenticate with Touch ID requires full authentication with Wealthfront's servers if the device does not have a Wealthfront PIN configured. Furthermore, any "privileged" access still requires a password. We feel this represents the best compromise between security and convenience. Now lets take a closer look at how we implemented our integration with Touch ID.

Integrating Touch ID Through Local Authentication

Integrating Touch ID into an application is a two step process:
  1. We ask if the device supports Touch ID by calling -canEvaluatePolicy:error:
  2. We call -evaluatePolicy:localizedReason:reply: to display the Touch ID alert view, it will call our reply block
Let's take a closer look at how we use these methods in our Wealthfront application in Touch ID authentication.

Check if Touch ID is available

The follow code fragment is a simplified version from our production code.
  • Lines 5-7: To set things up, we create an instance of LAContext that will be used for Touch ID authentication.
  • Lines 9-26: We use the -canEvaluatePolicy:error: API to see if the device can use Touch ID. If we get a YES back, we know the device is capable of evaluating the LAPolicyDeviceOwnerAuthenticationWithBiometrics policy. We will invoke the second API method (below) to request a fingerprint match. If the return value is NO, we will check the error code and generate a new localError to send back to the caller. Instead of using the LAError domain, we generate our own WFTouchIDErrorDomain and error code (see below for reasons) to propagate error message back to the caller.
  • Line 28: Here we call a method in another class to check if the user has opted out of using Touch ID.
  • Lines 29-34: Again we use our own WFTouchIDErrorDomain and error code so the caller method can parse it to get the error message.

Authenticate with Touch ID

If the above check result is YES, we can now proceed to the second step by calling -evaluatePolicy:localizedReason:reply:.
  • Lines 4-6: Here we first confirm the fallbackButtonTitleString, successBlock, and fallbackBlock are not nil.
  • Lines 7-9: We create a new LAContext object if it is nil.
  • Line 11: We pass a fallbackButtonTitleString as the fallback button title. This one cannot be nil, passing nil causes an exception which could crash the app.
  • Line 12: The reasonString is also required because Touch ID operations require this string to tell the user why we are requesting their fingerprint.
  • Lines 13-26: We pass the reasonString, successBlock, and fallbackBlock to -evaluatePolicy:localizedReason:reply:. The replyBlock will be passed a BOOL indicating whether or not the authentication attempt was successful. If the reply is a YES we can now proceed with the successBlock. Otherwise, we pass the error to fallbackBlock so it can check the error code to find out the reason for failure and act accordingly.
We can only call -evaluatePolicy:localizedReason:reply: when the app is in foreground. As soon as we make the call, we will see the Touch ID alert view to prompt the user to scan their registered finger.


It is very important to use dispatch_async to dispatch the successBlock and fallbackBlock back to the main queue for UI update. Otherwise the app will freeze for a long time since the -evaluatePolicy:localizedReason:reply: appears to be using a XPC service (no document from Apple to say so, but we saw evidence of this in Instruments). The UI would be updated only after the XPC service gives back control to the main queue.

Customize LAError error code for iOS 7 devices

When we were working with the iOS8 beta, we also needed to maintain compatibility with iOS 7. The problem is that LAContext, LAError are not available in iOS7 since the LocalAuthentication framework is new in iOS8. We also wanted to give the users the option to opt out of Touch ID operations. Moreover, we rely heavily on automatic testing on both devices and simulators. LAContext gives an undocumented -1000 error code if we try to call the methods on a simulator. In order to cope with every possibility listed above, we made a custom NS_ENUM called WFTouchIDError and use the userInfo dictionary to describe any error. For example, if the user opts out of Touch ID, we use WFTouchIDOptOut so that the caller can behave accordingly. In our unit tests, we can also use the WFTouchIDNotAvailable and userInfo to tell the simulator not to fail on a test since Touch ID is not supported.

Testing

At Wealthfront, testing is in our blood; no code is shipped without test coverage. The following code snippet test -authenticateByTouchIDWithFallbackButtonTitle:success:fallback:

This test follows a similar testing paradigm as I described in more detail in a previous blog. Briefly, we use the OCMock framework to decompose code into testable components. By isolating the code from its outside dependencies, we are able to make sure the code behaves as we expect from the bottom up. In this test, we use OCMock's andDo: block to make sure only one of the successBlock and fallbackBlock blocks is called.
  • Lines 2-6: Here we mock out LAContext and set the mocked object mockLAContext as _authManager's localAuthContext. Then we call out LAContext's expected -setLocalizedFallbackTitle: method with the expected calling parameter:@"Wealthfront PIN" as it will be used to set the fallback button title.
  • Lines 8-15: We spell out another anticipated LAContext call with expected parameters. We make sure reply: needs to be valid by setting the expectation as [OCMArg isNotNil] so we can execute the void(^reply)(BOOL success, NSError *error) block inside the andDo: block.
  • Lines 17-25: We call -authenticateByTouchIDWithFallbackButtonTitle:success:fallback: with a simple successBlock in which the BOOL variable value is changed and we make sure the fallbackBlock is not called by inserting a XCTFail inside the fallbackBlock.
  • Lines 27-30: We make sure that there is an exception thrown if we try to set the fallbackButtonTitle to be nil.

Final Thoughts

At Wealthfront, our mission is to bring the best and latest technology to our clients and thereby improve their experience. In iOS 8 Apple has finally provided a public API for us to leverage Touch ID in our applications. This allows us to deliver greatly enhanced convenience to clients without sacrificing security. We carefully considered the implications of adopting the Touch ID mechanism. Direct interaction with LocalAuthentication gives our clients the best experience and greatest security. It was very important to support Touch ID as soon as possible to give our users a delightful experience. We leveraged our continuous integration infrastructure to both validate our integration as well as verify that Apple had fixed bugs we discovered during the beta. This allowed us to be ready to put a build in the App Store within a few hours of Apple releasing the iOS 8 GM build.

Friday, September 26, 2014

Security Notice on Shellshock / CVE-2014-6271 / CVE-2014-7169

On September 24 a vulnerability in Bash, named Shellshock, was publicly announced. The original Shellshock advisory, CVE-2014-6271, described a severe remotely-exploitable vulnerability in all versions of GNU Bash software. A follow-up advisory, CVE-2014-7169, was issued for an incomplete fix to CVE-2014-6271.

Security review of Wealthfront systems confirmed no client-facing components were vulnerable to Shellshock. The Wealthfront team deployed fixes for CVE-2014-6271 and CVE-2014-7169 on all internal hosts, consistent with security best practices.

Further Resources for Shellshock Help

We recommend auditing all systems using Bash and upgrading. Here are some resources we found useful in our response to this disclosure:

As always, if you have any questions about the security of your Wealthfront account, contact us at support@wealthfront.com. We will continue to monitor this issue as the community and vendors investigate this vulnerability further.