What is Consistent Hashing?
Consistent Hashing is a distributed hashing technique that is loosely coupled with the number of servers or objects in a distributed hash table by positioning them in an abstract circle or hash ring. This allows servers and objects to scale without affecting the overall system.
Why would I need Consistent Hashing at all?
Assume that you have 10 servers and you need to distribute a thousand keys among these 10 servers. How would you do that? I know that the most impulsive decision is to assign the servers a unique number from 0 to 9 and then hash the keys using the modulo operation (
key % 10) to assign them to a server.
This obvious solution has a major drawback: it is not scalable in a horizontal fashion. This essentially means that whenever you add or a remove a server, you’d need to calculate the hashes again for the keys to assign them to a server.
Let’s dive into the explanation of Consistent Hashing
An explanation that uses an example does wonders, so let’s use one right away. We assume that we have three servers for the sake of simplicity. The steps involved in consistent hashing are described below:
- Decide a hash function that’s common for keys and servers. Let’s call this function
- We take the servers and hash them using
h(x)onto an abstract circle.
- The keys (the user IDs) in our example are then hashed using the hash function
h(x)and put onto the abstract circle.
- All the data related to a user is kept at a server that comes clockwise next to it. For example, only [email protected] and [email protected] have their details stored in servers
What advantages are offered when adding or removing a server?
- Contrary to traditional hashing, in consistent hashing, when a new server is added, all the keys need not be rehashed to be assigned to a server. In fact, in our example only the key [email protected] has to be moved from
- Similarly, assume that server
S2is removed. We would only need to remap [email protected] to
The keys may be haphazardly dispersed and in this manner may not be uniform. It might make the keys on servers lopsided. To deal with this issue, we include “virtual replicas” for servers. Rather than assigning each server to a solitary point on the ring, we map it to multiple points on the abstract ring and thus, make the system to better distribute the load.