Link to wealthfront.com

Fork me on GitHub

Thursday, July 29, 2010

I Can Has Invariant Mapz?

If I had to pick one of the major source of bugs in large refactorings I recently went through, it would probably be the bunch of methods in the java.util.Map interface which are contravariant on the key type. For instance, one can retrieve an element from a map using a supertype of its key type thanks to signature of #get(Object).

Map<Integer, String> map = newHashMap();
assertNull(map.get("im in ur programmz, codin in ur dialect"));

Now, I do understand that #get(K), #remove(K) and #containsKey(K) are indeed contravariant on K. After all, one can see these methods as functions from K to V, V and boolean, respectively, where K is the type of the keys and V the type of the values; and functions are contravariant on their argument types in Java. In other words, the function #get(Object) can be used wherever the function #get(Integer) is needed, as Object is a supertype of Integer. (Note that Map<K, V> itself is not contravariant on K, as there is a covariant occurrence of K in #keySet().)

However, I've never seen any code taking advantage of this property in practice. On the other hand, refactorings from Map<A, V> to Map<B, V> where the least common supertype of A and B is Object tend to lead to tricky bugs as the compiler cannot catch invocations of #get(Object), #remove(Object) or #containsKey(Object) that haven't been modified, as exemplified by the second snippet below:

void doSomething(Map<Symbol, Quote> cache) {
...
cache.get(symbol);
...
}

void doSomething(Map<Isin, Quote> cache) {
...
cache.get(symbol);
...
}

Caches implemented using a ForwardingMap with expiring entries I encountered in recent refactorings were particularly error prone as they tend to be used in a lot of places. Breaking cache invalidation due to an incorrect use of #remove(Object) might have catastrophic consequences. You better have a good test coverage, as it is your only safety net.

To overcome this problem, I am now advocating against leaking maps out of
small, controlled scopes. As a replacement, one should either use a custom invariant interface such as

public interface QuoteCache {
Quote get(Symbol symbol);
}

or wrap the map in the following interface:

public interface InvariantMap<K, V> {
void clear();
boolean containsKey(K key);
boolean containsValue(Object value);
Set<Entry<K, V>> entrySet();
V get(K key);
boolean isEmpty();
Set<K> keySet();
V put(K key, V value);
void putAll(Map<? extends K, ? extends V> m);
V remove(K key);
int size();
Collection<V> values();
}

Given a simple DelegatingInvariantMap implementation, the following factory is pretty useful:

public class InvariantMaps {

public static <K, V> InvariantMap<K, V> newHashMap() {
return delegate(new HashMap<K, V>());
}

public static <K extends Comparable<? super K>, V> InvariantMap<K, V> newTreeMap() {
return delegate(new TreeMap<K, V>());
}

public static <K, V> InvariantMap<K, V> newTreeMap(Comparator<? super K> comparator) {
return delegate(new TreeMap<K, V>(comparator));
}

public static <K, V> InvariantMap<K, V> delegate(Map<K, V> delegate) {
return new DelegatingInvariantMap<K, V>(delegate);
}

}

Monday, July 26, 2010

We Need More Than One; Why Programming Languages Matter

When we started kaChing's architecture, one core design principle was set in stone: our systems would be language neutral. We strive to combine the best ideas from all languages and employ a good number of them on a day-to-day basis (Java, Scala, Ruby, Python, bash, Clojure, Erlang and so on). We use simple data formats to exchange information between sub-systems (JSON, Protobuf).

I recently stumbled upon Dr. Kathleen Fisher's essay "We Need More Than One; Why students need a sophisticated understanding of programming languages". It perfectly captures the foundational value programming languages play in the study and practice of computer science.

Like natural languages, programming languages affect how we think. In particular, how we think when we communicate with computers. Each language comes with a natural domain of discourse. Ideas within or close to that domain are easy to express, while ideas further afield are harder. [...] A language is not just defined by what it includes, but by what it leaves out as well.


Just like excellent verbal and written communication is essential, mastery of multiple paradigms in software engineering is crucial. Programming languages must be front and centre. Their study must evolve from learning Java, C and Perl to learning computational theories, with the understanding that mainstream languages as mere cosmetic instances of greater concepts. We don't teach how to use Windows XP to CS majors; We teach virtual memory, semaphores and context switching. Why should we teach Java and C++ rather than π-calculus and parametric polymorphism?

Prezi on Continuous Deployment at kaChing

Here is a yet another presentation on Continuous Deployment at kaChing. The talk focused on continuous deployment and lean startup and was based on Pascal's and David's posts.


Wednesday, July 21, 2010

Elsewhere: Next Generation Java Programming Style

Vlad sent around a very good article on good Java coding style, which we all liked. Most of the suggestions are exactly what we do here, with a heavy functional influence on our Java code. A sample:
Final is your new love: More and more Java developers in the teams I’ve worked with, started to use final. Making variables final prevents changing those values. Most often, if you reasign a value, it’s a bug or you should use a new variable. Final prevents bugs and makes code easier to read and understand. Make everything immutable.
Read Stephen Schmidt's Next Generation Java Programming Style.

Friday, July 16, 2010

Oakland eBIG hosting kaChing to talk about Continuous Deployment

Next Wednesday July 21st we will be presenting our continuous deployment system at eBIG in Oakland.

During this presentation, I'll focus on three aspects.

  • Business value of continuous deployment; And how it fits into the Lean Startup principles.
  • kaChing's implementation of continuous deployment: testing, deployment manager, immune system
  • Lastly, I'll cover our step by step history of getting to continuous deployment


See you there!

Thursday, July 8, 2010

SiliconValley CodeCamp 2010

If your're interested in one or more of the following topics: Lean Startups, Extreme testing, DevOps, Continuous Deployment, Scala, ZooKeeper and Practical Compilers Techniques, join the kaChing team in the next Silicon Valley Code Camp.

If you're interested in some of the talks please log into the site and mark "Interested".
See you there!

Wednesday, July 7, 2010

Murder Your Darlings

Lately I've been working on connectivity with NASDAQ. The protocols involve parsing fixed-offset messages of various types. We're not doing high frequency trading so we are optimizing for programmer efficiency -- that is, the API I expose to the rest of the system should make sense, so I'm representing the different types of messages, trading conditions, exchange identifiers and so on as enums. The incoming message has a one-character code which I translate. I've seen programmers code this kind of thing using a big lookup table, but that leads to maintainability problems -- when you add an enum value, did you remember to add it to the case statement? Did that case statement get copy and pasted elsewhere? The better solution is to embed that logic in the enum itself using a static map and a factory method.
enum SoupMessageType {
LOGIN_REQUEST('L'),
LOGIN_ACCEPT('A'),
LOGIN_REJECT('J'),
DATA('S'),
LOGOUT_REQUEST('O');

private char code;

private SoupMessageType(char code) {
this.code = code;
}

private static final Map<Character, SoupMessageType> map;
static {
map = new HashMap<Character, SoupMessageType>(values().length);
for (SoupMessageType v : values()) {
map.put(v.code(), v);
}
}

public char code() {
return code;
}

public SoupMessageType from(char code) {
return map.get(code);
}
}

This is a simple pattern, and I found myself copying it from another enum. Since copy and paste is bad, I started looking for how to turn this pattern into an abstraction. First, I'd move the static code block into the constructor for a map-like class:
public class CodedEnumer<K, E extends Enum<E> & CodedEnum<K>> {
private Map<K, E> map;
public CodedEnumer(Class<E> klass) {
E[] enumConstants = klass.getEnumConstants();
map = new HashMap<K, E>(enumConstants.length);
for(E v : enumConstants) {
map.put(v.code(), v);
}
}

public static <K, V extends Enum<V> & CodedEnum<K>>
CodedEnumer<K, V> create(Class<V> klass) {
return new CodedEnumer<K, V>(klass);
}

public E get(K key) {
return map.get(key);
}
}

The enum needs to implement a CodedEnum interface with one method.
public interface CodedEnum<K> {
public K code();
}

My first draft of this included another type parameter E for the enum, and a public E from(K key) method. But of course this method should be static, and declaring a static method in an interface would be meaningless (aside from the other detail of being a compiler error).

Now, rather than building maintaining the map itself, the enum needs to implement CodedEnum, create an instance of the CodedEnumer, and use that to implement the one-line static method.
enum SoupMessageTypeCoded implements CodedEnum<String, SoupMessageTypeCoded> {
LOGIN_REQUEST('L'),
LOGIN_ACCEPT('A'),
LOGIN_REJECT('J'),
DATA('S'),
LOGOUT_REQUEST('O');

private String code;

private SoupMessageTypeCoded(String code) {
this.code = code;
}

private static final CodedEnumer<String, SoupMessageTypeCoded>
map = CodedEnumer.create(SoupMessageTypeCoded.class);

@Override
public String code() {
return code;
}

public static SoupMessageTypeCoded from(String code) {
return map.get(code);
}
}

Pretty slick, huh? I was pretty pleased with myself when I actually found a use for intersection in a generic declaration. This is where the old writer's advice of "murder your darlings" comes into play. It's various attributed to Fitzgerald, Hemingway or others, but the meaning is that whenever you write a particularly clever turn of phrase, whatever makes you smile at how smart you are, get out the red pencil or delete key, and get rid of it.

Realistically, this code sucks. I've created two extra types with complicated generics to save two or three lines of code. Anyone who opens up the class in the future will have to open two more files to understand how it works and what it's doing. So I hit delete and reverted to the version with those three horrible lines wastefully repeated in each and every enum I use this pattern in. I can only console myself that disk space is getting cheaper.

Saturday, July 3, 2010

Mimicking Abstract Classes with Objective-C

Objective-C doesn't have the abstract class construct. The common approach to mimic it is to use NSObject's doesNotRecognizeSelector:
@interface ShapeBase : NSObject {}
- (void)draw {
[self doesNotRecognizeSelector:_cmd];
}
@end

@interface Circle : ShapeBase {}
// missing draw method
@end

This forces subclasses to override, otherwise you get a runtime exception. There's also another approach we can take using a formal protocol.
@protocol ShapeBehavior
- (void)draw;
@end

@interface ShapeBase : NSObject {}
@end

@interface Circle : ShapeBase<ShapeBehavior> {}
@end // Compile error, missing draw method

Let's also add some sugar so we don't have to type ShapeBehavior every time we want to use a shape:
typedef ShapeBase<ShapeBehavior> Shape;

// Before typedef
ShapeBase<ShapeBehavior> *circle = [[Circle alloc] init];
[circle draw];

// After typedef
Shape *circle = [[Circle alloc] init];
[circle draw];

Using formal protocols gives you errors earlier in the development process but the trade-off is that it's not as lightweight as simply using doesNotRecognizeSelector.