Link to

Fork me on GitHub

Friday, September 19, 2014

Small Files, Big Problem

Data drives everything we do here at Wealthfront, so ensuring it’s stored correctly and efficiently is of the utmost importance. To make sure we’re always as informed as possible we gather data from our online databases very frequently. However, this has the unfortunate side effect of creating a large amount of small files, which are inefficient to process in Hadoop.

The Small Files Problem

Typically, a separate map is created for each file in the Hadoop job. An excessive amount of files therefore creates a correspondingly excessive amount of mappers. Further, when there are many small files each occupying their own HDFS block an enormous amount of overhead in the namenode is incurred. The namenode tracks where all files are stored in the cluster and needs to be queried any time an application performs an action on a file. The smooth performance of the namenode is thus of critical  importance as it is a single point of failure for the cluster. To make matters worse, Hadoop is optimized for large files; small files cause many more seeks when reading.

This set of issues in Hadoop is collectively known as the small files problem. One good solution when pulling small files stored in S3 to a cluster is to use a tool such as S3DistCp, which can concatenate small files by means of a ‘group by’ operator before they are used in the Hadoop job. We, however, cannot use this tool for our data set. Our data is stored in avro files, which cannot be directly concatenated to one another. Combining avro files requires stripping the header, which requires logic that S3DistCp does not provide.

A Consolidated Files Solution

To solve this the small files problem, we periodically consolidate our avro files, merging their information into a single file that is much more efficient to process. For data that is taken hourly, we take the files for each hour of the day and merge them into a single file containing the data for the entire day. We can further merge these days into months. The monthly file contains the same data as the set of all the hourly files falling within its span, but it is contained in a single location instead of many. By switching from using the original hourly files to monthly ones, we can cut down the number of files by a factor of 720.

Hours combine to form a day, days combine to form a month*.
*Wealthfront is aware that there are more than 3 hours in a day and more than two days in a month; this is a simplified visualization

We must ensure that we do not take this too far however. Consolidating already large files can begin to reduce performance again. To this prevent this, the code only consolidates the files if the combined size does not exceed a specified threshold. This threshold is chosen based on the HDFS block size; there is no gain to be had from a file that already fills a block completely.

Selecting Files For Computation

This creates the new challenge of dealing with files that span different durations. In general, when requesting data across an interval of time we want to choose the fewest amount of files that will give us the desired dataset without any duplicates. Consider the following diagram representing a collection of files arranged chronologically. We wish to fetch only the data falling between the red lines, and use as few files as possible. 


Our approach is a greedy algorithm which takes files spanning the largest amount of time first, and considering progressively smaller intervals. In this case, we first consider the monthly intervals. We eliminate the first month because it includes data outside our requested timeframe.

We next consider the days. We first eliminate the day not fully in our timeframe. We also eliminate the days that overlap with our previously selected month.

Applying the same action to hours gives us our final choice of files to use.

Note that the entire interval is covered, there is no duplication of data, and a minimum number of files are used. We fetch the same data that would have been retrieved by taking each hourly file in our interval, but it arrives in a format that is much more fit to process in Hadoop.

Handling Outdated Files 

The danger in creating consolidated files lies in the fact that the derived file will become outdated if the source files beneath it are changed. We protect ourselves against this risk by validating consolidated files when they are requested for use. If there is a file spanning a smaller interval time that was updated more recently than the file meant to encapsulate it the underlying data has changed since the consolidated file was created. We ignore the outdated file and go down to the next lower level of smaller files. We also log an error noting the consolidated file is outdated so it may be recreated and made fresh again.


We find that this consolidation has an enormous impact on performance. Our first test case was a job that previously operated on a few years worth of data, bucketed into files by hour. When attempting to use this multitude of small files, the cluster would fail after more than 2 hours when it ran out of memory. With the consolidated files, the same cluster successfully completed the job in 1 hour 15 minutes.

This strategy comes with the tradeoff of significantly increased disk space usage in our cloud storage system as we store a full copy of our data for each level of consolidation used. In our case, this is a very small penalty compared to the enormous gains in performance.

Our infrastructure is extremely important to us at Wealthfront. We are constantly working to ensure that our systems can support the rapid influx of new clients. File consolidation is just one of the many ways we keep our infrastructure as efficient and robust as possible. 

Thursday, September 18, 2014

The Unanticipated Intern Experience

Two hours after I walked through the front door at Wealthfront, I pushed code to production. Two weeks after that I took part in conference calls to outside business partners. Two weeks after that I planned a critical feature with product managers. Two weeks after that I debated UI elements with the lead designers at Wealthfront. Two weeks after that I wrote analytics code for the new features. It's more than I bargained for as a software engineer intern, and more than most would expect even as a full time engineer in Silicon Valley. But at Wealthfront it happens by design. Flat teams commissioned to self-organize as they see fit pull interns along simultaneously in the directions of engineering fundamentals, client-centric design and strategic business plans.

But as challenging and eye opening as it's been to sweep through the process of planning and designing a feature, that's only half the story of my time here. I worked as an engineer, after all, and perhaps the most memorable and valuable experience was the responsibility for prototyping, architecting, and building product-critical features. Sure, plenty of companies let interns take charge of projects and some companies let interns get their hands on critical products. Some even let interns build projects that may one day affect customers.

What sets Wealthfront apart is a willingness to give new employees full responsibility for projects that will immediately affect customers within days, if not hours. I spent much of the summer working on a feature to save clients time and effort as they set up their account. That's an obvious win for clients, and an equally obvious asset for Wealthfront. More importantly, the changes are a key enabler of future, even larger improvements to the whole client experience. Just as I was not siloed into a narrow role as a developer, I was also not siloed into a narrowly useful project.

Maximize your importance
Before thinking about specializing, every intern or full-time employee coming out of school is going to have to grapple with the gap between their experience and the new scale, breadth, and pace of a real software company. There are both technical and operational differences between how we learn to work in school and how employees work in Silicon Valley software companies.

It was immediately obvious that there were parts of the technology stack I was unfamiliar with and a couple programming styles I hadn't seen before. But interestingly enough, I found the technical gap easy to bridge. Reading through the codebase and asking coworkers a handful of questions was more than sufficient to fill in the gaps, probably because I already had a mental framework for understanding software. The difficult part was adjusting to the fact that, for the first time, it took more to manage a project than an assignment handout, a group text thread and a git repo. Knowing your own code and understanding a variety of job titles isn't enough, it takes observation and effort to understand how to integrate in and work with a highly horizontally mobile team.

Developing that framework is one of the largest benefits I did not expect to gain. It will pay dividends in my ability to evaluate companies, onboard onto new teams and contribute to their work processes.

The more exciting difference between school and real world software projects is simpler: The stakes are higher. Instead of working for a letter grade and having one or two users, there's more than a billion dollars and tens of thousands of customers who depend on your code. Obviously, that changes both your mindset and your workflow. Not only is this an important lesson to learn in an environment surrounded by mentors and extensive testing, it's also satisfyingly meaningful. For those of us fresh out of the classroom, finding a place where our work genuinely matters will affect our mindset and productivity much more than any technology or workflow.

Learn faster, learn smarter
While the potential meaningfulness of your work may not always be feasible to evaluate as a prospective intern or employee, there are a couple factors that are both visible to interviewees and fundamental for a new employee’s learning.

The single most important driver of my technical development this summer was feedback from both code reviews and test results. Maximizing learning, then, necessitates maximizing my exposure to feedback. Short of demanding more feedback (which has obvious drawbacks), the most practical way of doing this is maximizing the speed of the feedback loop for my work. I have worked in tech companies with development cycles ranging in length from 6 weeks to, thanks to Wealthfront, about 6 minutes. Often faster, since robust test suites at every level give reliable feedback for code correctness within seconds. Access to team-wide code review and deployment within minutes is a fast track for not only code, but also skill development.

Students looking to intern often wisely look for an internship where they believe they’ll work under and learn from the best people they can. What we don’t often realize is that the amount you learn from leaders is not just a function of the quality of the leaders, but also of the transparency and communication between you and those you work for. Employees always know what decisions are made. In good organizations, they know why these decisions are made. At Wealthfront, I know how they are made. Data-driven culture and its child principle of data democratization certainly make this easier, but there’s also a human aspect to this culture. I speak daily with a mentor and more than weekly with our VP of Engineering. Sometimes we talk about a javascript implementation and sometimes about types of stock and funding rounds.

Structure speaks louder than words
It’s hard, especially as a prospective intern, to determine whether a company will offer you the kind of learning opportunity you seek. I do now recognize, though, that the potential for learning as an intern at Wealthfront didn’t come from a proclaimed focus on interns but instead is deliberate residue of the larger design of how Wealthfront engineering works. The flat and integrated team structure enabled the breadth and pace of my experience. The robust test structure and lack of hierarchy enabled the level of responsibility and ownership other interns and I had. The unrelenting focus on automated testing and continuous deployment enabled the feedback loop. These characteristics are the result of both intention and expertise, and the opportunities I had could not occur without them.

The knowledge of how to recognize these characteristics in future companies might just be the most valuable lesson I’ve learned at Wealthfront.

Tuesday, September 16, 2014

iOS UI Testing

I was initially attracted to Wealthfront by its great business model and engineering driven culture. But after visiting the office, I was convinced that I belonged to Wealthfront because of the strong emphasis on test driven development (TDD), continuous integration, and rapid deployment. Furthermore, a solid foundation had already been laid for the iOS initiatives with a great test suite and a stable continuous deployment environment. TDD in Objective-C is something I always wanted to do, but sadly it is not a common practice in our community for a variety of reasons. Everyone agrees that unit testing is a good idea, but many developers shy away because they believe it is too hard.

For example a common misconception about iOS applications is that they largely involve UI code which is difficult to test in unit tests.

After working at Wealthfront for a few months I now know this is untrue. I am very glad that I made the right decision to join a great team dedicated to leading in this space. Of course there was some initial time investment to get used to the rigor of working with TDD and continuous integration. But after a short while, my productivity and code quality improved significantly. Moreover, I gained a much deeper understanding of how to anticipate the requirements of a class or method to allow much faster refactoring and iterative improvement. Our code base has grown significantly and we have pushed many major updates after our first launch without sacrificing code quality. We have also achieved over 93% code coverage with our test suite which we continue to improve.

I would like to share an example to demonstrate that UI testing is actually not that difficult. By decomposing the application into reusable, testable components, we can ensure the stability of the system without relying on large, cumbersome integration tests. To do this, we need to isolate the dependencies of individual components and leverage the compiler to guarantee a consistent API contract between these components. Finally, we can leverage mock objects or fake data as necessary to imitate the expected behavior of these dependencies. This approach allows us to eliminate test variability introduced by external dependencies such as the network, database, and UI animation. At Wealthfront, we use the Objective-C mock object framework OCMock (which is equivalent to JMock in Java) to facilitate this decomposition.

WFPinView onboarding page

The example we'll look at today is from our PIN on-boarding flow. When a new user launches the Wealthfront application for the first time they are shown an informative page (called WFPinView in this example) about our PIN feature. If the user wants to set up a PIN, they can either tap the right side “SET PIN” button or swipe the lock to the right until it touches the “SET PIN” button. Additionally the bar to the left of the lock changes color to visually reinforce the user’s choice. Alternatively the user can swipe the lock to the left or tap the left button to choose "NOT NOW" and dismiss the view.

There are a few things we want to do to test this UI:
  1. ensure each view is initially positioned at the correct location
  2. ensure all labels, colors, and text match the design specs
  3. confirm that all controls and gesture recognizers are properly configured for our expected user interactions
  4. ensure smooth animation for position and color changes in response to user interaction

Building the view with TDD

The first two steps are quite straightforward. By following TDD, we first layout the expected subviews' position, color, and text in our test code according to our design specifications. Then we programmatically build the view. At the end, we just need to use XCTAssert to confirm the subviews such as labels, buttons and images to have been positioned and the properties of the subviews are as expected:
In the above test, we confirm the following:
  • All of the subviews are positioned correctly (for simplicity, only a few lines of code are shown)
  • They are in the right view hierarchy
  • They are the correct type and their text and color are all as expected
When the test passes, we know the view is laid out correctly. Later if our design team wants to change something in WFPinView, one or more of the tests would fail and we will need to update the tests to make them pass again. This gives us a great built-in control to ensure any changes we make are the desired changes. Once we are certain the view is constructed correctly, we can start thinking about how the user interacts with the view.

Validating user interaction events

We need to make sure that user's interaction with views are handled correctly. On iOS, user interactions are dispatched by associating a target and action with a control (e.g. UIButton). At first glance, it seems there is no way for us to validate the button's target or action. Additionally it is unclear how to trigger these actions from our test. For example, by just looking at the UIButton class documentation there is no obvious method we can use to accomplish this. Fortunately, UIButton is a UIControl and when we look at UIControl there are a number of APIs available for us to use as shown in the following piece of test code:
The pinView is controlled by its view controller (vc). The first line in this test is to extract all of setPinButton's actions targeted to vc. We can then call XCTAssert to confirm its action selector is indeed the -setPinTapped: method by matching the selectorName.
Then we partially mock the view controller and set our expectation that the -setPinTapped: method would be called if and only if the setPinButton is tapped. Here the key is to fake the user tap event by calling UIControl’s -sendActionsForControlEvents: method.

Similarly, other UIControl subclasses such as UIPageControl, UISegmentedControl, UITextField can use a similar scheme to confirm the target-action configuration is as expected. Next we need to validate the animations for this view.

Unit testing animations with andDo:

One reason iOS developers shy away from unit testing is that UI is one of the major focuses of development. It is very typical for an iOS application to have animations. Some of them are pretty simple -- views that appear, disappear, or change color through a brief animated sequence. Other animations are more elaborate, involving groups of views and possibly completion blocks.

Let’s take a look at a simplified version of one of our animations in the WFPinView class:
This code adjusts the position of a UIView object (the lock) based on the velocity and distance of a finger movement that is captured by a UIPanGestureRecognizer. At the same time, another view (the slider view) adjusts its color accordingly. After the animation finishes, a completionBlock is called to do further work.

Even in this quite simple animation, we have many things to test to make sure it behaves as expected:
  • We need to confirm that the expected methods (e.g. +[UIView animateWithDuration:delay:options:animations:completion:]) are called with the proper parameters
  • We need to be sure the view moves to where we want it to be and the color of the slider view is changed correctly
  • We need to be certain that if there is a completionBlock, the block is called
The following is part of the test code:
  • Lines 2-4: We set up a mockGestureRecognizier to be used by WFPINView’s -animateLockWithGestureRecognizer:completion: method to calculate the animation duration and lock offset.
  • Lines 6-7: We use partialMockView to identify the method that should be called (-updateSliderColor:) with the expected delta value. If the expected value calculated in the test is different from the output in WFPinVew, an assertion error is generated.
  • Lines 9-22: We then mock +[UIView animationWithDuration:animations:completion:] method so that the mockView is used to confirm the calling parameters matches what we expect.
    • If the real animation duration is different from expectedAnimationDuration, the test will fail.
    • Use andDo: to set return values and access call parameters (e.g. the animation() and completion() blocks.
    • We expect at least the animation() block to be supplied so we can use [OCMArg isNotNil] to check that it is present. Alternatively, we don’t always expect to have a completion() block, so we use OCMOCK_ANY to signify this constraint.
  • Lines 14-15: We use -getArgument:atIndex: from NSInvocation to get the calling method’s parameters we are interested in. Since we have already confirmed that animationDuration is as expected, we simply execute animation() immediately and if there is a completion() block, we execute it as well.
  • Lines 24-28: We setup a completion block to change a BOOL value inside the completion block to confirm it is called correctly.
  • Line 30: We confirm the completion block is actually executed by checking the BOOL value is changed from NO to YES. This would only happen if the completion block is executed.
  • Line 31: We further ensure that the animated view is located at its final expected position after animation block executes.
  • Lines 33-35: We call -verify on the mocked objects and classes to ensure they were called as expected.

Final thoughts

Unit testing in iOS is not always easy, but after a few months of actively writing tests, it starts to become my second nature. With the development of many great tools such as OCMock, Kiwi, Xcode server and more evangelism from Apple, it is getting a lot easier. As Wealthfront continues its hyper-growth our new hires will onboard into this environment and quickly start contributing to our ongoing development. With this infrastructure in place we are confident that we can rapidly add new features and scale our code base while ensuring the quality of our application meets our standards.

Thursday, September 4, 2014

From Wall Street to Silicon Valley

The financial crisis caught my attention when I was a Master's student of computer science at the University of Pennsylvania. I realized the great potential of applying computer science to the financial services industry, which prompted me to join Bloomberg, and then later Goldman Sachs. As time passed, I found myself more and more dissatisfied by the fact that my work was so far removed from the general public.

Most people cannot afford a Bloomberg terminal for use in actively managing their personal finances. They will also need to spend a lot of time and energy if they try to do so. They are more likely to get a much higher financial reward in the long run if they invest this time and energy in their career. And when it came to the work I did at Goldman Sachs helping to build a risk management system, I realized that it would be mainly available to the wealthy—the general public wouldn't benefit from it. And sadly, it is the non-wealthy that need most to invest their savings for tomorrow but don't know how to do it effectively.

In March I learned from a friend about Wealthfront, a Silicon Valley startup managing hundreds of millions of dollars for ordinary people—and it did it using software. Their service makes the practice of Nobel Prize winning Modern Portfolio Theory easily accessible to everyone at extremely low costs. This is exactly the type of work I knew I wanted to do. The Silicon Valley Career Guide by Wealthfront chairman Andy Rachleff got me excited about the idea of leaving Wall Street behind and joining a startup in Silicon Valley. And the successful track records of the senior management team at Wealthfront gave me the confidence to decline great offers from Google, LinkedIn, and some more established startups. So I convinced my wife to quit her job, sold our house in Long Island, moved my entire family to the Valley and started working for Wealthfront—all within two months!

I’ve now been at Wealthfront for two months, and the following are the things that have impressed me the most.

Level of Transparency

Most of the private companies I interviewed with would not explain to me how well their business was doing with real data. And a sad fact is that all too often these metrics are even hidden from their current employees. Wealthfront is the opposite. Hanging on our walls are widescreen LCD displays with our most important business metrics, all for employees and visitors to see. We have web applications to view metrics and these are accessible for every employee to view and use to improve our business. Even our balance sheet and income statement are shared with all employees.

In addition to the metrics, our executives sit together with everybody and are open to sharing their thoughts and vision for the company. We have our “all hands” meeting every Friday, where our CEO Adam Nash describes our ongoing efforts as well as future directions in business, engineering and product, and answers any questions that people may have. After every board meeting, Adam shares the presentations he makes to the company's board with the entire staff, going over it in detail and addressing questions too. He even went out of his way recently to give the presentation again for a group of new hires and interns to help us gain a deeper understanding of our business. A transparent culture motivates and keeps everyone focused on our mission.

Engineering-Driven Culture

When I first interviewed with Wealthfront, I was surprised to learn that we had less than 40 people yet were managing more than $700 million. After my first week though, I was no longer surprised.

The level of automation in our development, testing, deployment and production systems is unparalleled in the financial industry. We practice test-driven development enthusiastically and vigorously, making sure that our system is robust and difficult to break. People put a lot of thought and effort in code reviews to make our code base elegant and easy to maintain and grow. This in turn also helps people learn and grow. The system follows a service oriented architecture that is highly modularized with automatic dependency management. Our build, deployment and production management process is fully automated. Code is built automatically once checked in and deployment is one-click away once the build is successful. We have a clear view of the status of all the services in our system and stakeholders are promptly notified for any exceptional conditions. This high level of automation enables us to do continuous deployment a dozen times a day, rolling out new features and making bug fixes quickly with confidence.

As time goes by, I realize that this high level of automation isn't a coincidence. It is a reflection of our engineering-driven culture.

In financial services companies on Wall Street, most engineers play a supporting role to the business. The technology department is generally considered as a cost center, and traders and financial analysts determine the priorities of engineers. Engineers are provided very little room for innovation and have very limited impact on business direction.

At Wealthfront, on the contrary, our engineers are the main drivers of the business. More than two-thirds of employees here are engineers. And the influence of our engineers impact all aspects of our business.

Every Monday engineers get together in cross-functional teams, including investment services, product, and client services. We review the performance and issues in each area, discuss progress of related projects, and determine the directions and priorities for everyone. The discussions are based on metrics coming from our data platform in the form of easy-to-understand tables and graphs that are visible to everyone. The general theme of these discussions are multidisciplinary and engineering-driven:
  • How do we better automate the customer onboarding process that satisfies regulatory requirements?
  • How do we maximize tax efficiency in our customer money transfer and withdrawal process?
  • How do we make the dashboard more intuitive and what tools we need to build for the client services team to support more customers without having a linear increase in their head count, etc.
The engineering-driven culture gives us a lot of leverage when scaling with software. We have an ever-increasing client to employee ratio. We don't need to hire more people to service more clients and manage more assets. Instead we are hiring more engineers to build more new features to better service our customers.

The onboarding process as an engineer was quite smooth. There is well-organized documentation to help you get started. My mentor Rija is proactive in helping me with any questions in setting up the development environment and getting started with projects. Everybody is ready to help.

In the past five years, I had only worked on systems built mostly with proprietary technology, even with proprietary programming language in the case of my last job at Goldman Sachs. Wealthfront, on the other hand, uses a lot of open source technology, most of which I had only heard about previously. Moreover, this is the first time that I have ever actually used a Mac, though I had long admired their beauty. Nevertheless, within a day, I was able to check in code to production. Within the first week, I fixed the root cause of a production issue, and started working on a project creating Cascading jobs to automate data quality checks on account performance metrics calculation.

Data quality check automation 

Data is at the heart of the financial services industry. Bloomberg generates more than $9 billion in revenue every year providing financial information. My past market risk technology team at Goldman Sachs had 60,000 dedicated CPUs crunching out numbers 24/7 to help the firm manage market risk. Bloomberg employs hundreds of people to manually input and verify data and fix data problems reported by clients. Junior market risk analysts spend at least half of their time visually comparing data provided by trading desks with the data in our market risk system. With the terabytes of data generated every day in the financial services industry, manual processes would never be able to scale and meet the demands. Managing data is a never ending struggle on Wall Street.

At Wealthfront, automation is in our DNA. We take the same strategy in ensuring our data quality. My first project involves automating data quality check for account performance metrics calculation.

We have a service that calculates the investment returns of different time-spans for each asset class of every account on a daily basis. The output data of the service are stored in Avro format permanently in our online storage. Data is loaded into our online systems, providing metrics for our clients’ dashboards. This data is also ETLed to our data platform and data warehouse for metrics dashboards and data exploration.

Although the calculation logic has solid unit test coverage, there could be problems in the input data fed into the calculation logic. There could also be edge cases with the data that unit tests don't cover. To prevent these cases and ensure data quality, we ensure automated test coverage for the data itself.

As described in an earlier post, we use Cascading to run offline batch processing jobs on Hadoop. These MapReduce processes allow parallel processing of data, which is scalable horizontally to meet the needs of our exponential growth in business. On top of our basic Cascading framework, we built an extension dedicated for Avro data quality check, picking up data records from online storage and output records that fail the data quality check to a different folder in the storage.

In this framework, we have a class called AvroDataQualityRunner that uses reflection to find out all the subclasses of AvroDataQualityTestExecutor, and executes them:
The AvroDataQualityTestExecutor implements the getFlow() function to connect the input and output taps specific to each executor to the Flow object, which is used to spawn off the cascading jobs that calls the apply() function in AvroDataQualityTester on each input record to do data quality check:
To do quality check for account metrics, I define an account metrics executor class that extends AvroDataQualityTestExecutor, providing my own input data Class type in getTestClass() and output data Class type in getBadTestClass(). The apply() function in AvroDataQualityTester captures all the inputs that fail the tests and output them with the exception messages. The only real work that I need to do is to provide a data quality tester for account metrics that extends AvroDataQualityTester, plug in the data quality check logic with junit assertions, and of course, write the unit tests for the data quality tests themselves.

After the data quality tests finish, we have another Cascading job that reads from the output of the above jobs and does statistical analysis of the errors. Summarized error reports are then sent to stakeholders for investigation.

Final Thoughts 

My first two months at Wealthfront have been wonderful. Despite my past experiences working only in big companies, I’ve had no problem at all adapting to this new environment. We exceed the level of transparency that Bloomberg is proud of and surpass the engineering quality that Goldman Sachs counts on for billion-dollar transactions on a daily basis. And I am working with people who are fun, smart and enthusiastic about our products and mission. Last but certainly not least, the weather in the Valley is fantastic!

Tuesday, September 2, 2014

Building and Testing a Custom UIDatePicker

When building the new Add Funds flow for iOS, we needed a date picker to select the start date for a recurring deposit. Apple's pre-built date picker (UIDatePicker) does not allow for visual customization. Having mostly dark backgrounds and light foregrounds, we needed to build our own date picking component (WFDatePicker). We already had a great design to work with, shown above. The three-segment interface was similar enough to the standard UIDatePicker to be intuitive, but the design fit with the rest of the app. At Wealthfront, we build new UI elements entirely separate from the rest of the app, to ensure they are properly encapsulated, tested extensively, and reusable. 

Friday, August 22, 2014

Summer Spent Learning and Experimenting

Though my decision of where to spend my 2014 summer may not carry the same drama or headlines as Lebron's summer decision, my experiences this summer would undoubtedly influence my pursuits after college. My priority was to find an internship program with supportive mentors and a company with an engineering backbone. In these aspects, Wealthfront has gone beyond my expectations.

Data Week
Wealthfront's Data Week coincided with the start of my internship. Data Week is dedicated to providing the Wealthfront team with the tools and skills necessary to be fluent with data and analytics. After brief introductions and setup, I found myself packed into a 20'x15' conference room with the whole engineering and product team. Daily workshops would encompass data methodologies and technologies that are essential to Wealthfront's data platform. These seminars were encouraging for people of any or no data expertise to attend. Depth and difficulty ranged from 'Intro to SQL' with select and query clauses to using Cascading on EMR. For example, the MapReduce seminar consisted of a high level overview of the MapReduce design architecture as well as the Cascading framework, followed by exercises to implement a new Cascading job within the data platform.

After the workshops, we split into three teams, each covering a different domain of Wealthfront (i.e. brokerage operations, investment services, and consumer Internet data). Each team's mission was to create data sources and dashboards providing diverse metrics on the operations and services at Wealthfront. The dashboards vary from daily cash flow metrics to Wealthfront's Money Ballers softball team statistics. For example, the dashboard shown below is a time series of the number of pull requests per day. The data source is generated using multiple data inputs from our online systems. The inputs are processed using Cascading and then loaded into our data warehouse. From there, the derived data is queried and loaded into this dashboard. By analyzing our engineering workflow, we can continually better our development processes.

I am extremely impressed by Wealthfront's encouraging attitude towards experimentation and learning. Already on my first week, I had the opportunity to explore the data warehouse, hack at a Cascading job, and learn about different perspectives on data analytics and data quality.

Internship Project
Holding true to its core engineering values, the Wealthfront team continually searches for and prioritizes processes to automate. One such process was the manual approval procedure for new accounts and retainment of client application information. Wealthfront retained physical copies of client new account applications in compliance with the Financial Industry Regulatory Authority (FINRA).  This process does not scale well with Wealthfront's current client and AUM growth. We wanted to develop an automated procedure of storing electronic copies of client applications that would be compliant with SEC and FINRA regulations. When developing a solution, I adopted Wealthfront's design methodology of identifying the goals and invariants, researching the technologies and dependencies, and developing an implementation and migration solution. Of course, the solution had to be coupled with a generous serving of tests to help you sleep easy at night.

One project goal was to electronically sign and store each retained new account application in a third-party storage provider. In compliance with FINRA regulations, the client records would have to be stored in write once, read many (WORM) format. This essentially meant redundant data storage filers, restrictions of delete permissions, and full audit logs of storage activity. We would ensure the integrity and security of confidential client information by encrypting the records and properly storing all data.

Another equally important goal was to implement an automated procedure to process and store each new client's data. This automation substitutes the old print-and-sign procedure and runs in the background, requiring no interaction with our Client Services team. A queue implementation would perfectly satisfy these requirements. Essentially, the queue worker would poll a database table for the next unprocessed payload, in this case, the next client's data. The worker would then encrypt and upload the data.

The next concern is at what point of the client application approval process to push the data onto the queue for storage. The most sensible answer is to inject this feature at the point where the Client Services team approves an applicant to open an account. We could then leverage the process of electronically storing data by pushing hundreds of clients' data onto the queue rather than print them out one-by-one. The automated service reduces hours of printing and signing documents every week. As new signups increase every week, we can ensure a minimal delay period between sign up and funding the account; thus, providing a better experience for our new clients.

Final Thoughts
What I value the most from this summer is the horizontal and vertical exposure to Wealthfront's many services. I have had the opportunity to explore and contribute to the user management service, the web service client, the mobile server, and the data platform. This is made possible by the mentors' willingness to offer advice and suggestions, while leaving ample room for personal struggles, failures, and successes. Another great aspect of this internship is witnessing the growth of Wealthfront. I continually meet new people from all walks of life every week, making this summer an incredible experience.

Thursday, August 21, 2014

Learning from Leaders

As a student looking for internships, two points that mattered most to me were finding a place with great mentors and finding a place where I could build strong relationships.

Two months into my internship at Wealthfront, I’ve looked back to evaluate whether I achieved my goals. To me, the most beneficial aspect of the intern program has been our discussions with the experienced executives at the company. These weekly Exec/Intern Q&A discussions have given interns the opportunity to gain some insight on topics ranging from finance and entrepreneurship to career choices and family. Here are some of the lessons I’ve learned from the past two months.

Importance of Learning

Learning on the job is a main focus of any successful internship. However, I learned this summer that learning shouldn’t stop there. In fact, learning is at the center of the decision-making process here and a central tenet for the whole company. When it comes to deciding the next big initiative, the most important question to answer is what can be learned from the project. This follows the principle of the scientific method where a hypothesis must be established. The hypothesis tells us what we want to learn from an experiment or initiative. According to Avery Moon, VP Research and Engineering, “the only failed project is the one where nothing was learned.”

In order to learn from a hypothesis, we need a way to measure the success of a project. Taking a Bayesian approach, if the hypothesis is the prior, then there must be a measurable observation that helps to form a stronger posterior. This approach makes tracking metrics and collecting data essential for successful learning. Every engineer at Wealthfront is encouraged to create dashboards and metrics to follow the impact of their projects on the overall growth and success of the company. Without keeping track of these metrics, it can be hard to know why something went wrong or why something went well.

The executives pointed out the same approach can be applied to personal learning as well. Maintaining a journal is an effective way to qualitatively evaluate ourselves. It forces us to remember events and thoughts of the day. As a result, we can understand which areas of our life have room for improvement and which areas we have success. Tracking quantitative metrics can be more difficult in our day-to-day lives, but there are certain situations where it makes a lot of sense. One simple example is logging workouts. Keeping track of a workout can show us where we are improving and encourage continuous progress. If we’re able to keep track of the results our personal experiments (whether quantitatively or qualitatively), we can better understand both our mistakes and successes.