To illustrate how this partitioning scheme allows for a balanced cluster assignment, we used 4450 email addresses from the Enron dataset to simulate arbitrary email addresses (keys) and we calculated how they would be assigned across our 5 clusters using the Python script below:

N = 5
counts = [0 for i in range(N)]
for email in open('chapter14/enron.txt'):
counts[hash(email.rstrip()) % N] += 1
