With leader election, you begin with a set of candidates that wish to become the leader and each of these candidates race to see who will be the first to be declared the leader. Once a candidate has been elected the leader, it continually sends a heart beat signal to keep renewing their position as the leader. If that heart beat fails, the other candidates again race to become the new leader. Implementing a leader election algorithm usually requires either deploying software such as ZooKeeper, or etcd and using it to determine consensus, or alternately, implementing a consensus algorithm on your own. Neither of these are ideal: ZooKeeper and etcd are complicated pieces of software that can be difficult to operate, and implementing a consensus algorithm on your own is a road fraught with peril. Thankfully, Kubernetes already runs an etcd cluster that consistently stores Kubernetes cluster state, and we can leverage that cluster to perform leader election simply by leveraging the Kubernetes API server.
Kubernetes already uses the Endpoints resource to represent a replicated set of pods that comprise a service and we can re-use that same object to retrieve all the pods that make up your distributed system. Given this list of pods, we leverage two other properties of the Kubernetes API: ResourceVersions and Annotations. Annotations are arbitrary key/value pairs that can be used by Kubernetes clients, and ResourceVersions mark the unique version of every Kubernetes resource in the cluster. Given these two primitives, we can perform leader election in a fairly straightforward manner: query the Endpoints resource to get the list of all pods running your service, and set Annotations on those resources. Each change to an Annotation also updates the ResourceVersion metadata. Because the Kubernetes API server is backed by etcd, a strongly consistent datastore, you can use Annotations and the ResourceVersion metadata to implement a simple compare-and-swap algorithm.
Google has used this approach to implement leader election as a Kubernetes Service, and you can run that service as a sidecar to your application to perform leader election backed by etc. For more on running a leader election algorithm in Kubernetes, refer to this blog post.
This is one of the parts that container services like Docker are striving to answer, but I don’t think they have it quite nailed down yet.