Consistency Models

1. Consistency Models

In distributed systems, consistency models define the guarantees a system provides about the visibility and ordering of updates across multiple nodes. Choosing a consistency model fundamentally affects system performance, fault tolerance, and availability, creating trade-offs that architects must carefully consider.

1.1 Strong Consistency

Strong Consistency represents the most stringent consistency guarantee, ensuring that all nodes in the system have the latest data at any given time. When a write operation is completed, all subsequent read operations across any node return the most recent data.

How Strong Consistency Works

The mechanism behind strong consistency requires that all replicas be updated after a write operation before acknowledging the write to the client. This synchronous approach ensures data accuracy but introduces latency.

Strong Consistency Flow:
┌─────────┐    Write   ┌─────────────┐    Update All   ┌─────────────┐
│ Client  │ ─────────> │   Primary   │ ──────────────> │  Replicas   │
└─────────┘            │    Node     │                 │   (All)     │
     ▲                 └─────────────┘                 └─────────────┘
     │                        │                               │
     │                        ▼                               ▼
     └──── Acknowledgment ────────── Wait for All Updates ────┘
          (Only after all 
           replicas updated)

Banking System Example

Consider a banking system where a user transfers $100 from Account A to Account B. With strong consistency:

  1. The transfer request is received
  2. Account A’s balance is reduced by $100
  3. Account B’s balance is increased by $100
  4. All replica databases are updated with these changes
  5. Only then is the transaction confirmed to the user
  6. Any subsequent read from any node shows the updated balances

Advantages and Trade-offs

Advantages:

  • Data Accuracy: Guarantees that all nodes return the latest data, eliminating stale reads
  • Simplified Logic: Application developers can reason about data state more easily, as they don’t need to handle inconsistent states

Disadvantages:

  • Increased Latency: Updates must propagate to all nodes before completion, adding a significant delay
  • Reduced Availability: During network partitions, writes may be blocked to prevent inconsistent states, directly impacting system availability

Strong consistency is ideal for applications where data accuracy is critical, such as financial transactions, account management systems, or inventory management, where overselling must be prevented.

1.2 Eventual Consistency

Eventual Consistency represents a weaker consistency model where the system guarantees that all nodes will eventually converge to the latest state, given enough time and no new updates, but does not ensure immediate consistency across all nodes.

How Eventual Consistency Works

After a write operation, replicas are updated asynchronously. Clients may receive stale data temporarily, but the system will eventually synchronize all nodes. This approach prioritizes availability and performance over immediate consistency.

Eventual Consistency Flow:
┌─────────┐    Write   ┌─────────────┐
│ Client  │ ─────────> │   Primary   │
└─────────┘            │    Node     │
     ▲                 └─────────────┘
     │                        │
     └──── Quick Ack ──────────┘
     
     Async Updates:
     ┌─────────────┐    Eventually   ┌─────────────┐
     │   Primary   │ ──────────────> │  Replica 1  │
     │    Node     │                 └─────────────┘
     └─────────────┘                 ┌─────────────┐
            │        ──────────────> │  Replica 2  │
            │                        └─────────────┘
            │                        ┌─────────────┐
            └──────────────────────> │  Replica N  │
                                     └─────────────┘

Social Media Example

Consider a social media application where a user updates their profile picture:

  1. The user uploads a new profile picture
  2. The primary server acknowledges the update immediately
  3. The new picture is asynchronously replicated to other servers worldwide
  4. Some users may see the old picture for a short period (seconds to minutes)
  5. Eventually, all users globally will see the updated picture

Advantages and Trade-offs

Advantages:

  • High Availability: Write operations can continue even during network partitions, ensuring the system remains responsive
  • Low Latency: Asynchronous replication allows for much quicker response times to clients

Disadvantages:

  • Temporary Inconsistencies: Users may experience stale data during the synchronization period
  • Complex Application Logic: Applications may need additional logic to handle potential inconsistencies gracefully

Eventual consistency is particularly suitable for applications with high read demands and tolerance for temporary inconsistencies, such as social media feeds, content delivery networks, cache systems, or DNS services.

2. Distributed Consistency Protocols

Distributed consistency protocols are sophisticated algorithms used to achieve consensus across nodes in a distributed system, ensuring data consistency despite node failures or network partitions. These protocols form the backbone of many distributed databases and coordination services.

2.1 Paxos Protocol

Paxos is a foundational consensus algorithm that enables a distributed system to agree on a single value even if some nodes fail. Despite its mathematical rigor and proven correctness, Paxos is notoriously challenging to implement and understand due to its inherent complexity.

Paxos Roles and Architecture

Paxos involves three main roles that nodes can assume:

Paxos Architecture:
┌─────────────┐    Propose    ┌─────────────┐    Accept/Reject   ┌─────────────┐
│  Proposer   │ ────────────> │  Acceptor   │ ─────────────────> │   Learner   │
│ (Initiates  │               │ (Votes on   │                    │ (Learns the │
│ proposals)  │               │ proposals)  │                    │ final value)│
└─────────────┘               └─────────────┘                    └─────────────┘
  1. Proposer: Initiates proposals and seeks to get a value accepted by a majority of acceptors
  2. Acceptor: Receives proposals and decides whether to accept or reject them based on proposal numbers
  3. Learner: Learns the agreed-upon value once consensus is reached among acceptors

Paxos Protocol Phases

The Paxos algorithm operates in three distinct phases:

Phase 1 - Prepare:

Prepare Phase:
Proposer                    Acceptors (Quorum)
   │                           │
   │─── Prepare(n) ───────────>│
   │                           │
   │<── Promise(n, v_acc) ─────│
   │                           │

Where:
- n = proposal number
- v_acc = highest accepted value (if any)

The proposer sends a prepare request with a unique proposal number to a quorum of acceptors. Each acceptor responds with a promise not to accept any proposal with a lower number and returns the highest-numbered proposal it has accepted.

Phase 2 - Accept:

Accept Phase:
Proposer                    Acceptors (Quorum)
   │                           │
   │─── Accept(n, v) ─────────>│
   │                           │
   │<── Accepted(n, v) ────────│
   │                           │

Where:
- v = proposed value (based on Phase 1 responses)

Based on the prepared responses, the proposer sends an acceptance request with the proposal number and value. Acceptors decide to accept or reject based on their promises.

Phase 3 - Commit: Once a quorum of acceptors agrees on a proposal, the value is committed, and all learners are notified of the consensus.

Paxos in Practice

Google’s Chubby lock service exemplifies Paxos in action. Chubby coordinates distributed resources by achieving consensus on lock ownership, ensuring that only one client can hold a particular lock at any time, even in the presence of network failures or node crashes.

Paxos Trade-offs

Advantages:

  • Fault Tolerance: Can tolerate failures of up to (n-1)/2 nodes while still reaching consensus
  • Strong Consistency: Ensures that once a value is chosen, it cannot be changed
  • Theoretical Foundation: Mathematically proven to be correct under specified assumptions

Disadvantages:

  • Implementation Complexity: The multi-phase structure and various edge cases make implementation extremely challenging
  • Performance Overhead: Multiple network round-trips can result in higher latency, especially in geographically distributed systems
  • Liveness Issues: Under certain network conditions, Paxos may fail to make progress

2.2 Raft Protocol

Raft was designed explicitly to be a more understandable alternative to Paxos while providing similar consistency guarantees. It achieves consensus through a more intuitive approach based on leader election and log replication.

Raft Architecture and Concepts

Raft organizes the consensus problem around three key concepts:

Raft Cluster Structure:

┌─────────────┐    Log Entries    ┌─────────────┐
│   Leader    │ ────────────────> │  Follower   │
│ (Manages    │                   │ (Replicates │
│ all writes) │                   │ leader log) │
└─────────────┘                   └─────────────┘
       │                                 ▲
       │         Log Entries             │
       ▼                                 │
┌─────────────┐                   ┌─────────────┐
│  Follower   │                   │  Follower   │
│ (Replicates │                   │ (Replicates │
│ leader log) │                   │ leader log) │
└─────────────┘                   └─────────────┘

Raft Protocol Phases

Election Phase:

Leader Election Process:

Normal State:     Leader Failure:      New Election:
┌─────────┐      ┌─────────┐          ┌─────────┐
│ Leader  │ ──X  │ Leader  │          │Candidate│
└─────────┘      └─────────┘          └─────────┘
     │               │                      │
     ▼               ▼                      ▼
┌─────────┐      ┌─────────┐          ┌─────────┐
│Follower │      │Follower │ ───────> │ Voter   │
└─────────┘      └─────────┘          └─────────┘

When followers detect no leader activity, they transition to candidate state and initiate an election by requesting votes from other nodes.

Replication Phase:

Log Replication:

Leader                           Followers
┌─────────────┐                 ┌─────────────┐
│Log: [1][2]  │─────Entry 3  ──>│Log: [1][2]  │
│    [3] NEW  │                 │    [3] NEW  │
└─────────────┘                 └─────────────┘
       │                               │
       ▼                               ▼
   Wait for majority                Send ACK
   acknowledgment                   when written

The leader appends entries to its log and sends them to followers. Entries are only considered committed when a majority of the cluster has acknowledged them.

Commit Phase: Once an entry is replicated to a majority of nodes, the leader commits it and notifies followers to apply it to their state machines.

Raft in Real Systems

Raft is widely adopted in production systems:

  • etcd: Kubernetes’ distributed key-value store uses Raft for cluster coordination
  • Consul: HashiCorp’s service discovery platform relies on Raft for consistent configuration storage
  • CockroachDB: Uses Raft for replicating data across nodes in their distributed SQL database

Raft Trade-offs

Advantages:

  • Simplicity: Much easier to understand and implement compared to Paxos
  • Strong Consistency: Ensures strong consistency through log replication with majority quorum
  • Clear Leadership: The single leader model simplifies client interaction and system reasoning

Disadvantages:

  • Leader Bottleneck: All writes must go through the leader, potentially creating a performance bottleneck
  • Election Overhead: Leader failures require election processes that can temporarily impact availability
  • Communication Requirements: Log replication requires acknowledgment from a quorum, which can slow operations in high-latency networks

Track your progress

Mark this subtopic as completed when you finish reading.