Cluster Aware via Gossip

Auto-discover cluster

A common problem in the cloud is, how to discover peers/nodes in a dynamic cluster environment. This is difficult because peers can come online and drop off without notice. Often peers on the network will be replaced rather than recovered (with new IPs). Also, many clouds don't allow broadcast messages either. Each cloud environment may have a particular hook or trick for determining peers, but likely these will not work in another cloud. Finally you probably want to be able to run your code in a local (unclustered) environment without changing too much.

I have a solution that covers these common problems. My solution use two pieces

  • A persistent data collection (ClusterSeed)
  • Gossip protocol.

Lets discuss ClusterSeeds first.

ClusterSeed

This simple collection only needs two fields. You could easily expand this with a third field to allow multiple clusters in a single collection. I am using MongoDB but everything but the TTL should port nicely to other solutions.

ClusterSeed Mongoose schema

The above defines lastSeen and name as well as the convenient expireAfterSeconds index using the popular Mongoose api (which adds data access methods to ClusterSeed)

Now that we have a collection we need to know how to use it.

Startup script

The Next thing that we must do is persist knowledge of a node in our cluster. We do this in the callback of our server.listen method of our app. It is quite simple and looks like this

Plant the seed

We are using several NodeJs and ExpressJs api's here, if you aren't' familiar, do some looking around later.

Notice we build up gHost and upsert a record. We call registerNode repeatedly so that we update lastSeen more frequently than the TTL interval. We don't require that stale records be removed, but it keeps our collection clean.

However we do require knowledge of the whole cluster. Otherwise we risk some nasty edge cases, where you end up with separated sub groups. You could recover from those but that would mess up the simplicity of this solution as seen in the next code sample...

Gossiping

The Gossip protocol itself doesn't indicate much itself about cluster identification, but a typical gossip implementation Gossipmonger has features like dead-Peer discovery using PHI, and knowledge of live-peers will be distributed as part of the gossip. Gossipmonger also helps us out by exposing a new peer event as well as the set of known livePeers via its Storage API.

Because of the Peer-to-Peer nature or our problem and the Gossip solution we also only need consult ClusterSeeds once and then Gossip takes care of the rest. As long as new nodes register themselves and dead-peers get weeded out overtime we have covered all our bases.

This code segment slides into the update callback to help ensure seeds are planted before reading, so that we can ensure that nodes who start simultaneously have a chance to finding each other (else we/they become unreachable).

Seed Gossipmonger from ClusterSeeds

Because you can only seed Gossipmonger once this code must be fenced so that it is only called once (Regardless future consultation of ClusterSeeds is not be necessary). Conveniently we have peerAccess that we set for later use. Once everything is turned on properly we will have a live node.

Once other nodes are online each node will know about then automatically using peerAccess.livePeers(), which exposes an optimistic list of peers who are most likely live. In general it will also contain suspectPeers (peers who might be dead). But, for most purposes this will be sufficient.

Observations

In practice nodes who startup sequentially will be made available almost immediately. If nodes come online later there is usually a couple second gap before all nodes are aware of each other (tunable via GOSSIP_INTERVAL option).

Future work

It would be interesting to see a more advanced use of Gossip, seeking consensus on livePeers before invoking server.listen. This would ensure that a new member of a cluster will not miss out on anything.

A final note on chatter

A common approach to gossip is to occasionally try gossiping with the dead. This will bring them back if they were only experiencing network difficulties. As a consequence there is a lot of unnecessary chatter at truly dead nodes (ones who have to be restarted or replace). A workaround is to forget about deadPeers. Although you aught consult ClusterSeeds first. On the same note, when fail to update ClusterSeeds they should also have to re-seed () Gossip, as will be out of the loop.

The following hack will delete dead peers to cut down on chatter. This would be appropriate if connectivity is strong but nodes are terminated permanently. If it is likely that node reachablilty is flakey, but nodes are robust doing so would be bad as it would isolate live nodes.

Delete Nodes via storage hack

No Comments Yet.

Leave a comment