Topics

Distributed Transactions

1. Distributed Transactions

In a distributed system, a distributed transaction involves coordinating actions across multiple nodes or databases to maintain data consistency and integrity. The fundamental challenge is ensuring that either all operations succeed (commit) or none are applied (rollback), even when these operations span multiple independent systems that may fail independently.

Distributed transactions are essential in scenarios such as:

Transferring money between accounts in different banks
Updating inventory across multiple warehouses
Maintaining referential integrity across microservices
Coordinating updates in federated database systems

The complexity arises from the need to coordinate across systems that may experience network partitions, node failures, or varying processing speeds, while still maintaining the ACID properties (Atomicity, Consistency, Isolation, Durability) that are expected from traditional database transactions.

2. Two-Phase Commit (2PC)

Two-Phase Commit (2PC) is a foundational protocol that enables atomicity in distributed transactions. It coordinates commit and rollback actions across multiple nodes to ensure that either all nodes commit the transaction or all nodes abort, maintaining data consistency across the distributed system.

How 2PC Works

The protocol operates with a clear hierarchy: one Coordinator node manages the entire process, while multiple Participant nodes execute the actual transaction steps. The coordination happens in two distinct phases that give the protocol its name.

2PC Protocol Flow:

┌─────────────┐                    ┌─────────────┐
│Coordinator  │                    │Participant 1│
└─────────────┘                    └─────────────┘
       │                                  │
       │────── Prepare Request ──────────>│
       │                                  │
       │<───── Yes/No Response ───────────│
       │                                  │
    ┌──▼──┐                               │
    │Check│                               │
    │All  │                               │
    │Resp │                               │
    └──┬──┘                               │
       │                                  │
       │───── Commit/Abort ──────────────>│
       │                                  │
       │<───── Acknowledgment ────────────│

Phase 1 - Prepare Phase:

The coordinator initiates the transaction by sending a prepare request to all participants. Each participant performs the following steps:

Executes the transaction operations locally
Writes the changes to a transaction log (without committing)
Locks the affected resources
Responds with either “Yes” (prepared to commit) or “No” (prepared to abort)

During this phase, participants are in a prepared state where they have done all the work necessary to commit but haven’t made the changes permanent yet.

Phase 2 - Commit Phase:

Based on the responses from Phase 1, the coordinator makes the final decision:

If all participants voted “Yes”: The coordinator sends a commit request to all participants
If any participant voted “No”: The coordinator sends an abort request to all participants

Participants then either commit their prepared changes or roll them back, and send an acknowledgment back to the coordinator.

Real-World Example: Bank Transfer

Consider transferring $500 from Account A (Bank 1) to Account B (Bank 2):

Bank Transfer with 2PC:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│Transfer     │    │   Bank 1    │    │   Bank 2    │
│Coordinator  │    │(Account A)  │    │(Account B)  │
└─────────────┘    └─────────────┘    └─────────────┘
       │                   │                   │
   ┌───▼───┐               │                   │
   │Phase 1│               │                   │
   │Prepare│               │                   │
   └───┬───┘               │                   │
       │──Prepare: Debit $500────>│            │
       │                   │                   │
       │──Prepare: Credit $500────────────────>│
       │                   │                   │
       │<─────Yes──────────────────────────────│
       │<─────Yes──────────│                   │
   ┌───▼───┐               │                   │
   │Phase 2│               │                   │
   │Commit │               │                   │
   └───┬───┘               │                   │
       │──────Commit──────>│                   │
       │──────Commit──────────────────────────>│
       │                   │                   │
       │<─────Ack──────────────────────────────│
       │<─────Ack──────────│                   │

Advantages and Disadvantages of 2PC

Advantages:

Atomicity Guarantee: Provides strong atomicity across distributed systems
Simplicity: Relatively straightforward protocol that’s easy to understand and implement
Consistency: Ensures all participants reach the same decision (commit or abort)

Disadvantages:

Blocking Nature: If the coordinator fails after participants have prepared but before sending commit/abort, participants remain blocked indefinitely
Single Point of Failure: The coordinator represents a critical failure point that can halt all transactions
Network Overhead: Multiple communication rounds significantly increase transaction latency
Resource Locking: Participants hold locks for extended periods, reducing concurrency

3. Three-Phase Commit (3PC)

Three-Phase Commit (3PC) was designed to address the primary weakness of 2PC: the blocking problem. By introducing an additional phase, 3PC reduces the risk of participants being indefinitely blocked when the coordinator fails.

How 3PC Works

3PC adds a Pre-Commit phase between the traditional Prepare and Commit phases, creating a three-step process that provides better fault tolerance.

3PC Protocol Flow:

┌─────────────┐                    ┌─────────────┐
│Coordinator  │                    │Participant 1│
└─────────────┘                    └─────────────┘
       │                                  │
   ┌───▼───┐                              │
   │Phase 1│                              │
   │Prepare│                              │
   └───┬───┘                              │
       │────── Prepare Request ──────────>│
       │                                  │
       │<───── Yes/No Response ───────────│
   ┌───▼───┐                              │
   │Phase 2│                              │
   │Pre-   │                              │
   │Commit │                              │
   └───┬───┘                              │
       │───── Pre-Commit Request ────────>│
       │                                  │
       │<───── Acknowledgment ────────────│
   ┌───▼───┐                              │
   │Phase 3│                              │
   │Commit │                              │
   └───┬───┘                              │
       │────── Commit Request ───────────>│
       │                                  │
       │<───── Final Acknowledgment ──────│

Phase 1 - Prepare Phase: Identical to 2PC - coordinator sends prepare requests and participants respond with their readiness.

Phase 2 - Pre-Commit Phase: If all participants voted “Yes” in Phase 1:

The coordinator sends pre-commit requests to all participants
Participants acknowledge they’re in a pre-commit state
Participants are now guaranteed to be able to commit if requested

Phase 3 - Commit Phase: If all participants acknowledged the pre-commit:

The coordinator sends commit requests
Participants commit their transactions
Participants send final acknowledgments

The Key Innovation: Non-Blocking Recovery

The critical improvement in 3PC is that if the coordinator fails after the pre-commit phase, participants can independently decide to commit the transaction, as they know all other participants have also reached the pre-commit state.

3PC Failure Recovery:

Scenario: Coordinator fails after Pre-Commit phase

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│Participant 1│    │Participant 2│    │Participant 3│
│(Pre-Commit) │    │(Pre-Commit) │    │(Pre-Commit) │
└─────────────┘    └─────────────┘    └─────────────┘
       │                   │                   │
       │<──────Elect New Coordinator──────────>│
       │                   │                   │
       │──────All in Pre-Commit State?────────>│
       │<─────────Yes──────────────────────────│
       │                   │                   │
       │──────Proceed with Commit─────────────>│

Advantages and Disadvantages of 3PC

Advantages:

Non-Blocking: Significantly reduces the risk of indefinite blocking
Better Fault Tolerance: Can recover from coordinator failures more gracefully
Maintains Atomicity: Still provides the same consistency guarantees as 2PC

Disadvantages:

Higher Complexity: The Additional phase makes the protocol more complex to implement and debug
Increased Latency: An Extra communication round increases transaction time
More Network Traffic: Additional messages increase network overhead
Not Completely Non-Blocking: Can still block in certain network partition scenarios

4. Distributed Locking Mechanisms

Distributed locking mechanisms coordinate access to shared resources across multiple nodes in a distributed system. They ensure that only one process can access a critical resource at any given time, preventing race conditions and maintaining data consistency.

4.1 Types of Distributed Locks

Pessimistic Locking

Pessimistic locking takes a defensive approach by preventing other nodes from accessing a resource through strict lock enforcement.

Pessimistic Locking Flow:

┌─────────┐    Request Lock    ┌─────────────┐
│Process A│ ─────────────────> │Lock Manager │
└─────────┘                    └─────────────┘
     │                                │
     │<──────Lock Granted─────────────┘
     │                                
     │        ┌─────────┐    Request Lock    
     │        │Process B│ ─────────────────> 
     │        └─────────┘                    
     │             │                         
     │             │<────Lock Denied────────
     │             │                         
     │             │<────Wait/Retry─────────
     │                                      
     │──────Access Resource─────────────────
     │                                      
     │──────Release Lock────────────────────>
                                            │
            ┌─────────┐                     │
            │Process B│<────Lock Granted────┘
            └─────────┘

Use Cases:

Financial transactions where data conflicts are expensive
Inventory management systems where overselling must be prevented
Configuration management where inconsistent states are dangerous

Optimistic Locking

Optimistic locking allows concurrent access but checks for conflicts before committing changes.

Optimistic Locking Flow:
┌─────────┐    Read Resource   ┌─────────────┐
│Process A│ ─────────────────> │   Resource  │
└─────────┘                    │  (Version=1)│
     │                         └─────────────┘
     │<──────Data + Version─────────────────────
     │                         
     │        ┌─────────┐    Read Resource    
     │        │Process B│ ─────────────────> 
     │        └─────────┘                    
     │             │<────Data + Version──────
     │             │                         
     │──────Modify Data─────────────────────
     │                                      
     │──────Update (Version=1)─────────────>
     │                                      │
     │<─────Update Success (Version=2)──────┘
                                            
            ┌─────────┐                     
            │Process B│──Update (Version=1)─>
            └─────────┘                     │
                 │<────Update Failed────────┘
                 │     (Version Mismatch)    
                 │                          
                 │<────Retry with New Data──

Use Cases:

Content management systems with infrequent conflicts
Social media applications where occasional conflicts are acceptable
Caching systems where performance is prioritized over strict consistency

4.2 Implementing Distributed Locks

Using Redis for Distributed Locking

Redis provides an excellent foundation for distributed locking through its atomic operations and built-in expiration mechanisms.

import redis
import time
import uuid

class RedisDistributedLock:
    def __init__(self, redis_client, lock_name, timeout=10, retry_delay=0.1):
        self.redis = redis_client
        self.lock_name = f"lock:{lock_name}"
        self.timeout = timeout
        self.retry_delay = retry_delay
        self.lock_value = str(uuid.uuid4())  # Unique identifier for this lock instance
    
    def acquire_lock(self, max_retries=10):
        """
        Acquire the lock with automatic retry mechanism
        """
        for attempt in range(max_retries):
            # Try to acquire the lock with expiration
            if self.redis.set(self.lock_name, self.lock_value, nx=True, ex=self.timeout):
                return True
            
            # Wait before retrying
            time.sleep(self.retry_delay)
        
        return False
    
    def release_lock(self):
        """
        Safely release the lock only if we own it
        """
        # Lua script ensures atomicity of check-and-delete
        lua_script = """
        if redis.call("get", KEYS[1]) == ARGV[1] then
            return redis.call("del", KEYS[1])
        else
            return 0
        end
        """
        return self.redis.eval(lua_script, 1, self.lock_name, self.lock_value)
    
    def extend_lock(self, additional_time):
        """
        Extend the lock expiration time
        """
        lua_script = """
        if redis.call("get", KEYS[1]) == ARGV[1] then
            return redis.call("expire", KEYS[1], ARGV[2])
        else
            return 0
        end
        """
        return self.redis.eval(lua_script, 1, self.lock_name, self.lock_value, additional_time)

# Usage Example
redis_client = redis.StrictRedis(host='localhost', port=6379, db=0)
lock = RedisDistributedLock(redis_client, "critical_resource", timeout=30)

if lock.acquire_lock():
    try:
        print("Lock acquired. Processing critical section...")
        # Simulate long-running operation
        time.sleep(20)
        
        # Extend lock if needed
        if lock.extend_lock(15):
            print("Lock extended for additional 15 seconds")
        
        # More processing...
        time.sleep(10)
        
    finally:
        if lock.release_lock():
            print("Lock successfully released.")
        else:
            print("Lock was not owned by this process or had expired.")
else:
    print("Could not acquire lock after maximum retries.")

Advanced Redis Locking Pattern: Redlock

For higher reliability, Redis offers the Redlock algorithm that uses multiple Redis instances:

import redis
import time
import random

class RedlockDistributedLock:
    def __init__(self, redis_instances, lock_name, timeout=10):
        self.redis_instances = redis_instances
        self.lock_name = lock_name
        self.timeout = timeout
        self.lock_value = f"{time.time()}:{random.randint(1, 1000000)}"
        self.quorum = len(redis_instances) // 2 + 1
    
    def acquire_lock(self):
        """
        Acquire lock from majority of Redis instances
        """
        acquired_count = 0
        start_time = time.time()
        
        for redis_instance in self.redis_instances:
            try:
                if redis_instance.set(self.lock_name, self.lock_value, nx=True, ex=self.timeout):
                    acquired_count += 1
            except:
                continue  # Skip failed instances
        
        # Check if we have quorum and sufficient time remaining
        elapsed_time = time.time() - start_time
        remaining_time = self.timeout - elapsed_time
        
        if acquired_count >= self.quorum and remaining_time > 0:
            return True
        else:
            # Release any acquired locks
            self.release_lock()
            return False
    
    def release_lock(self):
        """
        Release lock from all instances
        """
        for redis_instance in self.redis_instances:
            try:
                lua_script = """
                if redis.call("get", KEYS[1]) == ARGV[1] then
                    return redis.call("del", KEYS[1])
                else
                    return 0
                end
                """
                redis_instance.eval(lua_script, 1, self.lock_name, self.lock_value)
            except:
                continue  # Skip failed instances

Using ZooKeeper for Distributed Locking

Apache ZooKeeper provides highly reliable distributed coordination through its ephemeral znodes mechanism:

ZooKeeper Lock Structure:
/locks
  └── resource_name
      ├── lock-0000000001  (ephemeral sequential)
      ├── lock-0000000002  (ephemeral sequential)
      └── lock-0000000003  (ephemeral sequential)

Lock Acquisition Process:
1. Client creates ephemeral sequential znode
2. Client checks if it has the lowest sequence number
3. If yes, lock is acquired
4. If no, client watches the previous znode and waits

ZooKeeper Locking Algorithm:

Create Lock Node: Client creates an ephemeral sequential znode in the lock directory
Check Position: Client lists all children and checks if its node has the lowest sequence number
Acquire or Wait:

If lowest: Lock acquired
If not lowest: Set watch on the previous node and wait

Automatic Cleanup: When the client disconnects, its ephemeral node is automatically deleted

Advantages of ZooKeeper Locks:

Automatic Cleanup: Ephemeral nodes ensure locks are released when clients disconnect
Ordered Waiting: Sequential nodes provide fair ordering of lock requests
High Reliability: ZooKeeper’s consensus algorithm ensures strong consistency
Watch Mechanism: Efficient notification system for lock availability

Advantages and Disadvantages of Distributed Locks

Advantages:

Consistency Guarantee: Ensures mutual exclusion across distributed systems
Race Condition Prevention: Eliminates data corruption from concurrent access
Fault Tolerance: Modern implementations handle node failures gracefully
Scalability: Can coordinate access across thousands of nodes

Disadvantages:

Performance Overhead: Lock acquisition and release add latency to operations
Complexity: Distributed locks are more complex than local locks
Deadlock Risk: Multiple locks can create deadlock situations
Availability Impact: Lock service failures can halt critical operations

← Consistency Models Leader Election →

Track your progress

Mark this subtopic as completed when you finish reading.