Thursday, July 9, 2015

Unit Testing for Sass with Sassaby

At Wealthfront we use Sass to write all of our CSS stylesheets. Sass is a powerful CSS pre-processor that allows you to leverage features common in programming languages, but are absent from native CSS. Using Sass variables, conditionals, loops, and functions, you can write extensible CSS that is easier to maintain across a large front-end codebase.

As you can tell by countless posts on this blog, we take testing very seriously at Wealthfront. As a precursor to leveraging all of the Sass features, especially its repeatable functions, we needed a way to ensure that our Sass was adequately tested. This led us to develop and open-source Sassaby, a unit testing library for Sass.

In this blog post I'm going to detail a bit of the thought process that led us here and then use some examples from Wealthfront's codebase to show some of features of Sassaby.

What Brought us Here

Our needs to test Sass are similar to the needs of any testing library:

  1. Tests should be easy to write in a syntax that is familiar to its target audience.
  2. Assertions should be available for all features that should be tested.
  3. Tests should easily be integrated into our existing build system.
  4. Tests should run quickly in that build system.

While there are a few libraries already available for testing Sass, they are also written in Sass, which broke our first need. Sass is a great language for what it intends to be — a CSS pre-processor. Its syntax is not optimized for writing other types of scripts, such as testing libraries. Also, since it has to be compiled to CSS, writing testing code in Sass also had the downside of having to compile down to a file with the test results (which then had to be parsed). We preferred a testing library that would log the failing tests and immediately exit the build process.

We quickly settled on JavaScript as the language of choice for Sassaby. Writing this library with server-side JavaScript allowed us to:

  1. Integrate with our existing Node testing framework, Mocha.
  2. Use any of the available Node assertion libraries.
  3. Easily integrate with our build system (because we already have Node tests running).
  4. Use node-sass for compilation. This package is the Node wrapper on the libsass library (significantly faster than normal Ruby Sass compilation).
  5. Use the excellent Rework CSS package for parsing compiled CSS into JSON.

Sassaby

We built Sassaby to be testing framework agnostic. It will work with whichever Node testing library you are currently using. For the purposes of the examples in this blog post, however, we will be using Mocha. Setting up Sassaby is easy, as you just tell it which file you want to test:

Sassaby breaks down its features into the main repeatable parts of Sass: mixins, functions, and imports.

Mixins

Mixins allow you to return a repeated CSS rule declaration or set of CSS styles into your stylesheet. Sassaby has two categories of mixins: included mixins (returns a set of CSS styles) and standalone mixins (returns a full CSS rule declaration). Here is one of the standalone mixins from Wealthfront's grid:

This mixin has two main features that we would want to test. It interpolates the given argument into a class selector and gives it a declaration for center aligning items in the grid. We can test this with Sassaby by setting up the standalone mixin, calling it with an argument, and using some of the built-in assertions.

Sassaby's calledWithArgs function takes the specified file, the mixin, and its arguments and compiles it to the resulting CSS and an AST. Doing the CSS compilation at this step allows for each mixin to be tested in isolation and with different arguments as needed.

Standalone mixins are different than included mixins, which return only CSS style declarations so that they can be called inside an already defined rule set. Here is one of our included mixins that adds browser prefixes to the CSS filter declaration:

The testing interface for included mixins is very similar. For example we would test this mixin like this:

Functions

Functions are similar to mixins in Sass, but do not deal with actual CSS style selectors or rules. They are commonly used for unit conversions, such as this one that converts pixels to rems:

Sassaby supports a function testing interface similar to its mixin interface, but with different assertions. For this function, we want to test that the division is done correctly and that the correct unit is appended to the end. We can accomplish both of these goals with the equals assertion, but the purposes of this example I have also showed the doesnotEqual assertion.

Imports

The final testable feature in Sassaby is imports, which allow you break out variables, mixins, and styles into a more organized file structure. Let's say our main file looks like this:

We obviously would want to test that these files are imported. Sassaby allows this through this interface:

Bringing it all Together

One thing that we've noticed by starting to test our Sass mixins is that it has forced us to write better single responsibility mixins. Here is one of our old mixins from our grid that handles the creation of rules for reordering columns:

Trying to test this would be a bit too complicated. We would have to test that the loop is creating a new class for each column, that the label is interpolated correctly, and that the rule declaration is one less than the column number. It would be much easier to break it out into two single responsibility mixins:

Doing this will allow us to test both of these mixins in isolation. Also, notice how these mixins rely on an externally defined variable, $grid-columns. Sassaby provides a way to stub in these external dependencies, as shown with the below tests for the broken out mixins:

We hope that this has been a good guide to writing more coherent CSS with Sass and testing it with Sassaby. The assertions used in these examples are only a sample of what is available, and we encourage you to check out the full documentation and download the library on npm. We also take pull requests on GitHub.

Thursday, July 2, 2015

Testing with Optical Character Recognition (OCR)

Put not your trust in money, but put your money in trust.
-Oliver Wendell Holmes, Sr., The Autocrat of the Breakfast Table

At Wealthfront, we manage over $2.5 Billion in assets that we have been trusted with by our clients. Trust and transparency are the foundations of a strong relationship between a financial institution and its clients. The lack of trust in an institution impedes its efficiency and innovation. At Wealthfront, we place a strong premium on establishing and maintaining trust between our clients.


One of the core ways in which we establish trust is by building robust and reliable software products for our clients. The essential component in ensuring high quality software is testing. Without testing, we cannot validate or have confidence in any software we build. This is how we, at Wealthfront, gain trust in our code.



Logic Validation vs. Data Validation


Software can be tested or validated at various levels using a variety of techniques. Two necessary forms of software testing are: (1) logic validation and (2) data validation. Logic validation is the process of determining whether the applied logic (in the form of code) exhibits the desired behaviour. Whereas, data validation is the process of ensuring that the supplied input data to a system is valid and consistent with intended data types of the system. There exists many tools that facilitate and standardize practices for logic validation in the industry. However, the techniques for data validations are particular to the use cases.


At Wealthfront, we practice data validation in our continuous deployment model to ensure that each deployed service is functioning with the new set of data. As described in the linked blog post, the technique employed is customized for the specific use case in question. Another scenario where we use data validation is for verifying the contents of document images we receive from external partners. As these documents are images embedded in PDF, we cannot use standard PDF parsing techniques to validate their content. Rather the technique we use is called optical character recognition (OCR).


Optical Character Recognition (OCR)

Optical character recognition (OCR) refers to the automated process of translating images of text into machine-encoded text, such as ASCII. It is widely used in commercial applications to store, edit, search and analyze text documents (typewritten or text). This is done in a matter of seconds which would otherwise be a cumbersome manual task. OCR works by scanning your images, extracting the contained text, splitting the text into characters and then recognizing those characters. It can be trained to recognize a variety of different fonts, languages and even handwritten text. In the open source world, Tesseract is perhaps the most accurate and leading OCR engine. Originally developed as a PhD research project at Hewlett-Packard (HP) in the 1980s, Tesseract has been significantly enhanced by Google after it became open source. At Wealthfront, we use Tesseract to do OCR validation on scanned PDF documents.

Since Tesseract uses Leptonica image processing libraries to perform OCR, it only works with image files such as PNGs or TIFFs and cannot work with PDFs directly. It needs to be combined with a PDF interpreter, such as Ghostscript, an excellent interpreter and manipulator of Postscript and PDF files to image files. To perform OCR in Java code, you need a Java Native Access (JNA) wrapper for simplified native library access to Tesseract OCR engine. Tess4J is the JNA wrapper that combines Tesseract DLLs with Ghostscript to provide feature support for PDF documents.


Following is some sample Java code that takes a scanned PDF document, converts it into PNGs, and then performs OCR using Tess4J libraries:



Sample OCR Input (PNG file): Sample OCR Output (text):

This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.
The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.








Although Tesseract’s accuracy for interpreting images to text is sufficient and compares well to commercial options, its execution speed is slow. From sample runs, it takes roughly 8-10 seconds to perform OCR on a small pdf document (3-4 pages). The immediate culprit here isn't Tesseract though, it’s Ghostscript. Tess4J’s pdfUtilities internally uses Ghostscript to convert a pdf file to a set of png images.


Ghostscript Performance Enhancements

There are settings that can be tuned to increase the performance of Ghostscript. If you use the default convertPdf2Png method in Tess4J’s pdfUtilities, then custom settings cannot be exercised. However, you can always write your own wrapper for Ghostscript and calibrate settings to optimize the performance of your program, such as the sample:

Ghostscript suggests using the options for multithreaded rendering (increase the rendering bands for concurrency on multi-core systems) via -dNumRenderingThreads=n or giving it more memory for performance improvements. However, from experimentation results, they offered little to no improvement for our set of input data.


Output Resolution


The resolution at which you perform the document conversion does have a direct impact on Ghostscript performance, albeit at the cost of quality of the output image file. While converting documents at lower DPI will reduce the conversion time, they will increase the inaccuracy of the OCR interpretation and vice versa. You can specify the output image resolution with the -rres argument. By default, Ghostscript converts images at 72 DPI which is quite low. Following are the performance results comparison at different DPIs:

Conversion resolution - DPI
72 DPI default
100 DPI
200 DPI
250 DPI
300 DPI
400 DPI
Runtime Increase
--
~1.25X
~2.2X
~2.8X
~3.5X
~13X


Selective Page Conversion


Another useful option is selective page conversion, which is dependent on the use case where you only want to perform OCR on selected pages of a document. This significantly reduces runtime by not defaulting to converting the entire document, especially for larger documents. You can specific the range of pages you want to convert using the following two options: -dFirstPage=1 -dLastPage=n. Even if the document size is unknown prior to conversion, you can use any PDF reader (such as Apache PDFBox) to retrieve page count.

Single page conversion is still roughly linear to entire document conversion since there isn’t any noticeable overhead associated with Ghostscript initialization. Significant performance improvements for selective page conversion start to kick for documents over 20 pages. The following should provide a good relative comparison for the different document sizes and conversion times.


Pages in PDF
< 5 Pages
~10 Pages
~50 Pages
~100 Pages
Runtime - converting 3 pages individually
~1sec
~1sec
~1sec
~1sec
Runtime - converting entire document
~1sec
~2sec
~11sec
~20sec
*Conversion times may vary across different machines


Tesseract Performance Enhancements

The next bottleneck is the core Tesseract OCR process which can also be tuned for performance. One of the allowable optimization that can be applied with Tess4J wrapper method for OCR (doOCR) is calling it in combination with a Rectangle. The Rectangle bounds the region of the image that needs to be recognized while performing OCR. From test runs, the runtime improvement is about 4x when using a Rectangle of dimensions (0, 0, 1000, 1000) in comparison to not using Rectangle.


Using Rectangle


Following are the runtime improvements when using Rectangles of different size from sample runs:


Rectangle Dimensions
No Rectangle
(0, 0, 1000, 1000)
(0, 0, 1500, 1500)
(0, 0, 2000, 2000)
Runtime Improvement
--
4X
2X
1.4X
Runtime (seconds)
1.6sec
0.4sec
0.7sec
1.1sec
*Results were gathered after running doOCR on a PDF document image of US Letter size.

Although not linear, there are still incremental improvements when using Rectangles of reducing sizes. If your use case does dictate performing OCR on the entire document, this will not be a good optimization candidate. Otherwise the improvements can make a significant impact to your application's runtime.

In terms of usage, the entire OCR process is very CPU and memory intensive. Although the optimizations discussed above are advantageous, they will be eventually capped due to this intensive OCR process. In order to perform OCR validations in bulk efficiently, you need to parallelize the process on multi-core systems. The only caveat there is that Tess4J OCR APIs do not support multithreading. The limiting factor there is the way Tess4J uses Ghostscript. Tess4J uses Ghostcript’s low-level APIs that do not support multithreading.


Accuracy of Tesseract OCR Process

In terms of accuracy, Tesseract’s OCR is not completely precise and exhibits some level of variance when interpreting text images into ASCII. Common variance include:
  • Misinterpretation of the letter case: Interpreting uppercase for lowercase letters and vice versa
  • Mistaking letters, numbers, symbols that share similar ASCII symbol shapes, such as:

Actual Character
OCR Interpreted Character
0
O
0
°
I
|
I
:
5, 6 or 8
S

As stated previously, having higher quality images will also help Tesseract accurately analyze your image. Following are the failure rates after performing OCR on the same set of images converted at different resolutions via Ghostscript. Diminishing returns surface after 250-300 DPI, but any images lower than 200 have poor quality and prove to be ineffective OCR candidates:

Conversion resolution - DPI
< 100 DPI
150 DPI
200 DPI
250 DPI
300 DPI
400 DPI
Failure Rate
> 99%
~51%
~18%
negligible
negligible
negligible

Rectangles used to perform OCR can also impact the overall accuracy of the result. From experiment, using Rectangles of larger sizes tend to produce more accurate results in comparison to smaller ones.

For our purposes, we were only interested in optically recognizing numerical characters, and hence the noise due to the variance was easily overcome by building a common misinterpretation maps and doing character replacements on any misinterpretations. These variances occurred roughly on at least 20% of the documents we tested. However, if you are interested in performing OCR on alphanumeric characters, then you should explore the options of improving accuracy by training Tesseract to do better image recognition of your document’s fonts and languages.


Finding Optimal Settings


Despite the variances, inaccuracy, and performance overhead, Tesseract combined with Ghostscript still offers reasonable capability to perform optical character recognition in a cost effective way. Ghostscript has a variety of options that can be explored to generate the best suited document for your OCR process. And Tesseract can be tuned and trained to optically recognize your input documents with higher precision and accuracy. The core tradeoff does exist between performance and accuracy, since the two share an inverse relationship. The ideal options can only be discovered after experimenting with your own set of data!


Conclusion


Although not very common, optical character recognition was adopted as a testing technique for a unique scenario because of the strong premium we place on testing at Wealthfront. Software testing is emphasized here because we are a fully automated software based financial advisor and testing is the key method that we use to validate and gain trust in our overall processes. Needless to say, being a financial advisor that manages over $2.5 Billion of client assets, trust is one of the fundamental and uncompromisable values of our core business.

Tuesday, June 30, 2015

Performant CSS Animations: Netflix Case Study

While going over performant web animations with our new batch of interns, Netflix's new redesign was brought up as an example of something that seemed to have DOM nodes that changed size and pushed other DOM nodes around.

From my previous blog post on Performant Web Animations, animating properties such as width or top forces the browser to recalculate the size and location of every component that may have changed, called a layout. Layouts are something you want to avoid when trying to build performant web animations. The two properties that can be animated well are opacity, and transform.

Netflix's main hover animation has been a great example for explaining how complex animations can be built using only opacity, and transform.

Netflix built their animations by avoiding layouts and being clever with animating transform and opacity. In January, Jordanna Kwok from Netflix said:

To build our most visually-rich cinematic Netflix experience to date for the website and iOS platforms, efficient UI rendering is critical. While there are fewer hardware constraints on desktops (compared to TVs and set-top boxes), expensive operations can still compromise UI responsiveness. In particular, DOM manipulations that result in reflows and repaints are especially detrimental to user experience.

So how did they make this effect, using only transforms and opacity? There are three things happening here on hover. The hovered tile enlarges, the content changes to show a description, and the other tiles in the row move to the sides. Let's break this effect down one piece at a time in order to understand it.

Enlarging The Tile

We have learned that the way we should make an element larger is by using transform: scale(). We can simply apply that transformation on hover. If we have an image that is 100px by 100px, and we ask the browser to grow it to 200px to 200px we will end up with a blurry image. We can solve this by using an image that is twice as large and setting it to 50% of its width at scale(1).

Changing The Content

When you hover over one of the thumbnails on Netflix, it changes the image and text appears. This is done with an overlay the same size as the parent tile that contains the new image and the description. It starts with opacity: 0 and transitions to opacity: 1 on hover.

Sliding The Rest Of The Row

This is one of the most interesting parts of their effect. Instead of letting the browser recalculate the position of every element on the page, or even just in the row that is being modified, Netflix is using JavaScript to manually move the appropriate elements in the appropriate directions using transform: translate3d().

Putting It All Together

We have all the pieces now that we need to make the effect; swapping out the content, making the tile grow, and moving its neighbors out of the way. By using an overlay and animating opacity we can swap out the content. By animating transform: scale we can make an element grow. By animating transform: translate we can move the neighbors. Once we put these effects all together, we end up with a demo just like Netflix's effect.

Conclusion

Beautiful website UIs are using more and more subtle animations to delight the user. In order to build animations that truly feel spectacular, we need to understand how to keep the browser from doing computationally expensive operations. Layouts and repaints are particularly expensive. With this redesign Netflix has given us a great example that a little creativity can give us great experiences even when only animating transform and opacity.