Menu

Complement TDD with MDA

January 10, 2010

Test Driven Development (aka TDD) is on the rise. Good developers understand that code with no proper testing is dead code. You can’t trust it to do what you want and its hard to change. I’m a strong believer in Dijkstra’s observation that “Program testing can be a very effective way to show the presence… Read more

Actually Implementing Group Management Using ZooKeeper

January 09, 2010

ZooKeeper offers, in the words of its documentation, “off-the-shelf… group management”. The “off-the-shelf” part is inaccurate; it really offers the proper primitives to *implement* group management, but it’s up to you to fill in a few missing pieces. I’ll be describing one type of group management system I built at KaChing using ZooKeeper: A group… Read more

Flexible Log Monitoring with Scribe, Esper, and Nagios

January 05, 2010

If you have yourself a pretty decent sized cluster, there’s probably a good chance that you’ve had the following experience: One day, while routinely browsing some server logs, you stumble upon some concerning entries that you wish you had been made aware of sooner. You could probably go back and write some custom scripts that… Read more

Subversion Backup

January 02, 2010

Yes, we’re using Subversion. I know that distributed version control systems (e.g. Git) are cool and we might get there sometime, but for misc reasons we’re still using SVN. For the records, some of us are using GIT-SVN and we’re working and releasing from trunk (part of a the lean startup methodology) so the branching… Read more

Attaching a Java debugger to the Scala REPL

December 29, 2009

I’m using the Scala REPL to play around with java libraries and check their runtime behaviors. One of the things I’m using it for is to check how Voldemort’s client is behaving in different setups. For one of the checks I wanted to trace the client threads with an IDE debugger. To attach a debugger… Read more

Baking availability SLA into the code

December 09, 2009

Availability and Partition Tolerance are essential for many distributed systems. A simple (though not comprehensive) way to measure both is using response time SLAs between services as implied from Jeff Darcy‘s observation: Lynch (referring to the 2002 SIGACT paper) also makes the point that unbounded delay is indistinguishable from failure. Time is therefore an essential… Read more