1. Distributed Transactions
In a distributed system, a distributed transaction involves coordinating actions across multiple nodes or databases to maintain data consistency and integrity. The fundamental challenge is ensuring that either all operations succeed (commit) or none are applied (rollback), even when these operations span multiple independent systems that may fail independently.
Distributed transactions are essential in scenarios such as:
- Transferring money between accounts in different banks
- Updating inventory across multiple warehouses
- Maintaining referential integrity across microservices
- Coordinating updates in federated database systems
The complexity arises from the need to coordinate across systems that may experience network partitions, node failures, or varying processing speeds, while still maintaining the ACID properties (Atomicity, Consistency, Isolation, Durability) that are expected from traditional database transactions.
2. Two-Phase Commit (2PC)
Two-Phase Commit (2PC) is a foundational protocol that enables atomicity in distributed transactions. It coordinates commit and rollback actions across multiple nodes to ensure that either all nodes commit the transaction or all nodes abort, maintaining data consistency across the distributed system.
How 2PC Works
The protocol operates with a clear hierarchy: one Coordinator node manages the entire process, while multiple Participant nodes execute the actual transaction steps. The coordination happens in two distinct phases that give the protocol its name.
2PC Protocol Flow:
┌─────────────┐ ┌─────────────┐
│Coordinator │ │Participant 1│
└─────────────┘ └─────────────┘
│ │
│────── Prepare Request ──────────>│
│ │
│<───── Yes/No Response ───────────│
│ │
┌──▼──┐ │
│Check│ │
│All │ │
│Resp │ │
└──┬──┘ │
│ │
│───── Commit/Abort ──────────────>│
│ │
│<───── Acknowledgment ────────────│
Phase 1 - Prepare Phase:
The coordinator initiates the transaction by sending a prepare request to all participants. Each participant performs the following steps:
- Executes the transaction operations locally
- Writes the changes to a transaction log (without committing)
- Locks the affected resources
- Responds with either “Yes” (prepared to commit) or “No” (prepared to abort)
During this phase, participants are in a prepared state where they have done all the work necessary to commit but haven’t made the changes permanent yet.
Phase 2 - Commit Phase:
Based on the responses from Phase 1, the coordinator makes the final decision:
- If all participants voted “Yes”: The coordinator sends a commit request to all participants
- If any participant voted “No”: The coordinator sends an abort request to all participants
Participants then either commit their prepared changes or roll them back, and send an acknowledgment back to the coordinator.
Real-World Example: Bank Transfer
Consider transferring $500 from Account A (Bank 1) to Account B (Bank 2):
Bank Transfer with 2PC:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│Transfer │ │ Bank 1 │ │ Bank 2 │
│Coordinator │ │(Account A) │ │(Account B) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
┌───▼───┐ │ │
│Phase 1│ │ │
│Prepare│ │ │
└───┬───┘ │ │
│──Prepare: Debit $500────>│ │
│ │ │
│──Prepare: Credit $500────────────────>│
│ │ │
│<─────Yes──────────────────────────────│
│<─────Yes──────────│ │
┌───▼───┐ │ │
│Phase 2│ │ │
│Commit │ │ │
└───┬───┘ │ │
│──────Commit──────>│ │
│──────Commit──────────────────────────>│
│ │ │
│<─────Ack──────────────────────────────│
│<─────Ack──────────│ │
Advantages and Disadvantages of 2PC
Advantages:
- Atomicity Guarantee: Provides strong atomicity across distributed systems
- Simplicity: Relatively straightforward protocol that’s easy to understand and implement
- Consistency: Ensures all participants reach the same decision (commit or abort)
Disadvantages:
- Blocking Nature: If the coordinator fails after participants have prepared but before sending commit/abort, participants remain blocked indefinitely
- Single Point of Failure: The coordinator represents a critical failure point that can halt all transactions
- Network Overhead: Multiple communication rounds significantly increase transaction latency
- Resource Locking: Participants hold locks for extended periods, reducing concurrency
3. Three-Phase Commit (3PC)
Three-Phase Commit (3PC) was designed to address the primary weakness of 2PC: the blocking problem. By introducing an additional phase, 3PC reduces the risk of participants being indefinitely blocked when the coordinator fails.
How 3PC Works
3PC adds a Pre-Commit phase between the traditional Prepare and Commit phases, creating a three-step process that provides better fault tolerance.
3PC Protocol Flow:
┌─────────────┐ ┌─────────────┐
│Coordinator │ │Participant 1│
└─────────────┘ └─────────────┘
│ │
┌───▼───┐ │
│Phase 1│ │
│Prepare│ │
└───┬───┘ │
│────── Prepare Request ──────────>│
│ │
│<───── Yes/No Response ───────────│
┌───▼───┐ │
│Phase 2│ │
│Pre- │ │
│Commit │ │
└───┬───┘ │
│───── Pre-Commit Request ────────>│
│ │
│<───── Acknowledgment ────────────│
┌───▼───┐ │
│Phase 3│ │
│Commit │ │
└───┬───┘ │
│────── Commit Request ───────────>│
│ │
│<───── Final Acknowledgment ──────│
Phase 1 - Prepare Phase: Identical to 2PC - coordinator sends prepare requests and participants respond with their readiness.
Phase 2 - Pre-Commit Phase: If all participants voted “Yes” in Phase 1:
- The coordinator sends pre-commit requests to all participants
- Participants acknowledge they’re in a pre-commit state
- Participants are now guaranteed to be able to commit if requested
Phase 3 - Commit Phase: If all participants acknowledged the pre-commit:
- The coordinator sends commit requests
- Participants commit their transactions
- Participants send final acknowledgments
The Key Innovation: Non-Blocking Recovery
The critical improvement in 3PC is that if the coordinator fails after the pre-commit phase, participants can independently decide to commit the transaction, as they know all other participants have also reached the pre-commit state.
3PC Failure Recovery:
Scenario: Coordinator fails after Pre-Commit phase
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│Participant 1│ │Participant 2│ │Participant 3│
│(Pre-Commit) │ │(Pre-Commit) │ │(Pre-Commit) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
│<──────Elect New Coordinator──────────>│
│ │ │
│──────All in Pre-Commit State?────────>│
│<─────────Yes──────────────────────────│
│ │ │
│──────Proceed with Commit─────────────>│
Advantages and Disadvantages of 3PC
Advantages:
- Non-Blocking: Significantly reduces the risk of indefinite blocking
- Better Fault Tolerance: Can recover from coordinator failures more gracefully
- Maintains Atomicity: Still provides the same consistency guarantees as 2PC
Disadvantages:
- Higher Complexity: The Additional phase makes the protocol more complex to implement and debug
- Increased Latency: An Extra communication round increases transaction time
- More Network Traffic: Additional messages increase network overhead
- Not Completely Non-Blocking: Can still block in certain network partition scenarios
4. Distributed Locking Mechanisms
Distributed locking mechanisms coordinate access to shared resources across multiple nodes in a distributed system. They ensure that only one process can access a critical resource at any given time, preventing race conditions and maintaining data consistency.
4.1 Types of Distributed Locks
Pessimistic Locking
Pessimistic locking takes a defensive approach by preventing other nodes from accessing a resource through strict lock enforcement.
Pessimistic Locking Flow:
┌─────────┐ Request Lock ┌─────────────┐
│Process A│ ─────────────────> │Lock Manager │
└─────────┘ └─────────────┘
│ │
│<──────Lock Granted─────────────┘
│
│ ┌─────────┐ Request Lock
│ │Process B│ ─────────────────>
│ └─────────┘
│ │
│ │<────Lock Denied────────
│ │
│ │<────Wait/Retry─────────
│
│──────Access Resource─────────────────
│
│──────Release Lock────────────────────>
│
┌─────────┐ │
│Process B│<────Lock Granted────┘
└─────────┘
Use Cases:
- Financial transactions where data conflicts are expensive
- Inventory management systems where overselling must be prevented
- Configuration management where inconsistent states are dangerous
Optimistic Locking
Optimistic locking allows concurrent access but checks for conflicts before committing changes.
Optimistic Locking Flow:
┌─────────┐ Read Resource ┌─────────────┐
│Process A│ ─────────────────> │ Resource │
└─────────┘ │ (Version=1)│
│ └─────────────┘
│<──────Data + Version─────────────────────
│
│ ┌─────────┐ Read Resource
│ │Process B│ ─────────────────>
│ └─────────┘
│ │<────Data + Version──────
│ │
│──────Modify Data─────────────────────
│
│──────Update (Version=1)─────────────>
│ │
│<─────Update Success (Version=2)──────┘
┌─────────┐
│Process B│──Update (Version=1)─>
└─────────┘ │
│<────Update Failed────────┘
│ (Version Mismatch)
│
│<────Retry with New Data──
Use Cases:
- Content management systems with infrequent conflicts
- Social media applications where occasional conflicts are acceptable
- Caching systems where performance is prioritized over strict consistency
4.2 Implementing Distributed Locks
Using Redis for Distributed Locking
Redis provides an excellent foundation for distributed locking through its atomic operations and built-in expiration mechanisms.
import redis
import time
import uuid
class RedisDistributedLock:
def __init__(self, redis_client, lock_name, timeout=10, retry_delay=0.1):
self.redis = redis_client
self.lock_name = f"lock:{lock_name}"
self.timeout = timeout
self.retry_delay = retry_delay
self.lock_value = str(uuid.uuid4()) # Unique identifier for this lock instance
def acquire_lock(self, max_retries=10):
"""
Acquire the lock with automatic retry mechanism
"""
for attempt in range(max_retries):
# Try to acquire the lock with expiration
if self.redis.set(self.lock_name, self.lock_value, nx=True, ex=self.timeout):
return True
# Wait before retrying
time.sleep(self.retry_delay)
return False
def release_lock(self):
"""
Safely release the lock only if we own it
"""
# Lua script ensures atomicity of check-and-delete
lua_script = """
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
"""
return self.redis.eval(lua_script, 1, self.lock_name, self.lock_value)
def extend_lock(self, additional_time):
"""
Extend the lock expiration time
"""
lua_script = """
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("expire", KEYS[1], ARGV[2])
else
return 0
end
"""
return self.redis.eval(lua_script, 1, self.lock_name, self.lock_value, additional_time)
# Usage Example
redis_client = redis.StrictRedis(host='localhost', port=6379, db=0)
lock = RedisDistributedLock(redis_client, "critical_resource", timeout=30)
if lock.acquire_lock():
try:
print("Lock acquired. Processing critical section...")
# Simulate long-running operation
time.sleep(20)
# Extend lock if needed
if lock.extend_lock(15):
print("Lock extended for additional 15 seconds")
# More processing...
time.sleep(10)
finally:
if lock.release_lock():
print("Lock successfully released.")
else:
print("Lock was not owned by this process or had expired.")
else:
print("Could not acquire lock after maximum retries.")
Advanced Redis Locking Pattern: Redlock
For higher reliability, Redis offers the Redlock algorithm that uses multiple Redis instances:
import redis
import time
import random
class RedlockDistributedLock:
def __init__(self, redis_instances, lock_name, timeout=10):
self.redis_instances = redis_instances
self.lock_name = lock_name
self.timeout = timeout
self.lock_value = f"{time.time()}:{random.randint(1, 1000000)}"
self.quorum = len(redis_instances) // 2 + 1
def acquire_lock(self):
"""
Acquire lock from majority of Redis instances
"""
acquired_count = 0
start_time = time.time()
for redis_instance in self.redis_instances:
try:
if redis_instance.set(self.lock_name, self.lock_value, nx=True, ex=self.timeout):
acquired_count += 1
except:
continue # Skip failed instances
# Check if we have quorum and sufficient time remaining
elapsed_time = time.time() - start_time
remaining_time = self.timeout - elapsed_time
if acquired_count >= self.quorum and remaining_time > 0:
return True
else:
# Release any acquired locks
self.release_lock()
return False
def release_lock(self):
"""
Release lock from all instances
"""
for redis_instance in self.redis_instances:
try:
lua_script = """
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
"""
redis_instance.eval(lua_script, 1, self.lock_name, self.lock_value)
except:
continue # Skip failed instances
Using ZooKeeper for Distributed Locking
Apache ZooKeeper provides highly reliable distributed coordination through its ephemeral znodes mechanism:
ZooKeeper Lock Structure:
/locks
└── resource_name
├── lock-0000000001 (ephemeral sequential)
├── lock-0000000002 (ephemeral sequential)
└── lock-0000000003 (ephemeral sequential)
Lock Acquisition Process:
1. Client creates ephemeral sequential znode
2. Client checks if it has the lowest sequence number
3. If yes, lock is acquired
4. If no, client watches the previous znode and waits
ZooKeeper Locking Algorithm:
- Create Lock Node: Client creates an ephemeral sequential znode in the lock directory
- Check Position: Client lists all children and checks if its node has the lowest sequence number
- Acquire or Wait:
- If lowest: Lock acquired
- If not lowest: Set watch on the previous node and wait
- Automatic Cleanup: When the client disconnects, its ephemeral node is automatically deleted
Advantages of ZooKeeper Locks:
- Automatic Cleanup: Ephemeral nodes ensure locks are released when clients disconnect
- Ordered Waiting: Sequential nodes provide fair ordering of lock requests
- High Reliability: ZooKeeper’s consensus algorithm ensures strong consistency
- Watch Mechanism: Efficient notification system for lock availability
Advantages and Disadvantages of Distributed Locks
Advantages:
- Consistency Guarantee: Ensures mutual exclusion across distributed systems
- Race Condition Prevention: Eliminates data corruption from concurrent access
- Fault Tolerance: Modern implementations handle node failures gracefully
- Scalability: Can coordinate access across thousands of nodes
Disadvantages:
- Performance Overhead: Lock acquisition and release add latency to operations
- Complexity: Distributed locks are more complex than local locks
- Deadlock Risk: Multiple locks can create deadlock situations
- Availability Impact: Lock service failures can halt critical operations