Actually Implementing Group Management Using ZooKeeper

January 09, 2010

ZooKeeper offers, in the words of its documentation, “off-the-shelf… group management”. The “off-the-shelf” part is inaccurate; it really offers the proper primitives to *implement* group management, but it’s up to you to fill in a few missing pieces.

I’ll be describing one type of group management system I built at KaChing using ZooKeeper:

  • A group contains some logical service. The *meaning* of belonging to a group is typically “the instance is available for use by clients over the network”.
  • Services can join and leave the group. The special case of a service crashing or a network outage needs to be handled as leaving the group.
  • Joined services share metadata about how to communicate with it, i.e., its IP address, base URL, etc.
  • Clients can ask what instances are in the group, i.e., available.
  • Clients are notified when group membership changes so they can mutate their local state.

These map onto ZooKeeper as:

  • A group is a (permanent) node in the ZooKeeper hierarchy. Clients and services must be told the path to this node.
  • A services joins the group by creating an ephemeral node whose parent is the group node. By using an ephemeral node, if the service dies then the service is automatically removed from the group.
  • The ephemeral node’s data contains the service metadata in some format like JSON, XML, Avro, Protobufs, Thrift, etc. ZooKeeper has no equivalent of HTTP’s “Content-Type” header to identify the metadata representation, so services and clients must agree upon the format in some manner.
  • Clients can query for the children of the group node to identify the members of the group.
  • Clients can place a watch on the group node to be notified if nodes have joined or left the group.

To help with development I use zkclient (pros: provides a much more natural interface to zk from Java compared to the actual ZooKeeper library, somewhat responsive committers; cons: some of the API semantics are difficult to understand, somewhat responsive committers). zkconf from Patrick Hunt makes it trivial to get zk running.

I’ve looked a bit at norbert from LinkedIn but the documentation is very slim (no README even!); from what I can make out from the code it seems to be well thought-out and provide a super-set of the system I’ve described. LinkedIn-ners, can you help a brother out?

Bonus section: Service Manifest

One downside to the system I’ve described is that it doesn’t avoid the “rogue service”: some service that shouldn’t be running actually is running. It’s happened to everyone before, don’t be shy; you retired service X but it wasn’t wiped from the servers and when the box rebooted some cron job restarted it…. oops.

To handle this you need a service “manifest” that lists all the services that *should* be running, so clients can filter out group members that are available but shouldn’t be. In ZooKeeper this can be a parallel tree to the group node that uses
permanent nodes rather than ephemeral nodes for group members, or just one permanent node whose data contains the group members, or some variation on that. And make sure you have your clients sound the alarm when a rogue service shows up.

.. Adam

P.S. the Blogger post editor really really sucks.