Topics

Data Replication and Consistency

1. CAP Theorem in Distributed Systems

The CAP Theorem, formulated by Eric Brewer, represents one of the most fundamental principles governing distributed systems architecture. This theorem establishes that any distributed data store can simultaneously guarantee only two out of three critical properties during network partitions.

The Three Pillars of CAP

Consistency ©: This guarantee ensures that every read operation receives the most recent write or returns an error. Under consistency, all nodes in the distributed system see identical data simultaneously. When a write operation completes successfully, any subsequent read from any node will return that updated value.
Availability (A): This property ensures that every request, whether read or write, receives a non-error response within a reasonable time frame. The system remains operational and responsive even when some nodes experience failures or become unreachable.
Partition Tolerance (P): This characteristic enables the system to continue functioning despite network partitions—situations where communication between nodes breaks down due to network failures, high latency, or other connectivity issues.

The Fundamental Trade-off

Since network partitions are inevitable in real-world distributed systems, the CAP theorem forces architects to choose between consistency and availability when partitions occur. This creates three possible combinations:

       CAP Theorem Trade-offs            
         C (Consistency)
              /|\
             / | \
            /  |  \
           /   |   \
          /    |    \
         /     |     \
        /      |      \
       /       |       \
      /        |        \
     /         |         \
    +-----------+-----------+
   P             |            A
(Partition       |       (Availability)
 Tolerance)      |
                 |
          Network Partition
           (Inevitable)

CAP Combinations Explained

CP (Consistency + Partition Tolerance)

Systems choosing the CP approach prioritize data consistency over availability during network partitions. When a partition occurs, some nodes may become unavailable to maintain data accuracy across all accessible nodes.

Characteristics:

Guarantees that all nodes return identical data
May reject requests during partitions to maintain consistency
Suitable for applications where data accuracy is paramount

Example Implementation: Apache HBase exemplifies this approach by ensuring data consistency even if it means some regions become temporarily unavailable during network issues.

AP (Availability + Partition Tolerance)

AP systems maintain availability during partitions but may serve stale or inconsistent data until the network heals and nodes can synchronize.

Characteristics:

Ensures the system remains responsive during partitions
May return outdated information temporarily
Ideal for applications prioritizing user experience over perfect consistency

Example Implementation: Apache Cassandra and Amazon DynamoDB follow this model, allowing nodes to serve requests even when they cannot communicate with all replicas.

CA (Consistency + Availability)

This combination works only in environments without network partitions, typically in single-node systems or tightly coupled architectures within the same data center.

Characteristics:

Provides both consistency and availability
Cannot handle network partitions
Limited to non-distributed or co-located systems

Example Implementation: Traditional relational databases like MySQL in single-server configurations represent this approach.

Practical Application: E-commerce Platform

Consider an e-commerce platform’s inventory management system:

CP Approach: The system ensures accurate inventory counts across all nodes, preventing overselling by temporarily making inventory checks unavailable during network partitions. This approach prioritizes business logic correctness over user experience.
AP Approach: The system allows customers to browse products and make purchases even during partitions, potentially showing outdated inventory levels. The system accepts the risk of temporary inconsistencies to maintain customer engagement.

2. Conflict Resolution Strategies

In distributed systems, particularly those following the AP model, conflicts inevitably arise when multiple nodes process updates independently. Effective conflict resolution strategies are essential for maintaining data integrity while preserving system availability.

2.1 Last Write Wins (LWW)

Last Write Wins represents the simplest conflict resolution strategy, relying on timestamps or version numbers to determine which update should prevail.

Mechanism: Each write operation includes a timestamp or sequence number. During conflict resolution, the system retains only the version with the most recent timestamp, discarding all previous versions.

Implementation Flow:

Node A: Write(key=profile, value=photo1, timestamp=100)
Node B: Write(key=profile, value=photo2, timestamp=150)

Conflict Resolution:
timestamp(photo1) = 100 < timestamp(photo2) = 150
Result: photo2 wins, photo1 is discarded

Advantages:

Extremely simple to implement and understand
Minimal computational overhead
Low storage requirements for metadata

Disadvantages:

Potential for data loss when intermediate updates are overwritten
Dependency on synchronized clocks across nodes
Not suitable for applications where every change carries business value

Use Cases: Social media status updates, user preference settings, and other scenarios where only the final state matters more than the update history.

2.2 Version Vectors

Version vectors provide a sophisticated approach to conflict detection by tracking the causal relationships between updates across different nodes.

Mechanism: Each node maintains a vector of version numbers, with one entry for each node in the system. When an update occurs, the originating node increments its counter in the vector. This creates a causal history that helps determine update relationships.

Version Vector Structure:

Node A Vector: [A:3, B:1, C:2]  // Node A has seen 3 of its own updates,
                                // 1 from B, and 2 from C
Node B Vector: [A:2, B:4, C:1]  // Node B has seen 2 from A, 4 of its own,
                                // and 1 from C

Conflict Detection Logic:

Ancestor: Vector V1 is an ancestor of V2 if all elements in V1 are ≤ corresponding elements in V2
Descendant: Vector V2 is a descendant of V1 if V1 is an ancestor of V2
Concurrent: Vectors are concurrent if neither is an ancestor of the other (conflict detected)

Example Scenario:

Initial State: Document = "Hello World"
Node A Vector: [A:1, B:0]
Node B Vector: [A:0, B:1]

After Updates:
Node A: Document = "Hello Beautiful World", Vector: [A:2, B:0]
Node B: Document = "Hello Wonderful World", Vector: [A:0, B:2]
Comparison: Neither vector dominates the other → Concurrent updates → Conflict!

Advantages:

Precise conflict detection based on causality
Provides foundation for sophisticated merge strategies
Maintains an updated history for complex resolution logic

Disadvantages:

The vector size grows with the number of nodes
Increased storage and computational overhead
Requires additional logic actually to resolve detected conflicts

Use Cases: Collaborative document editing, distributed databases requiring strong consistency models, and systems where understanding update causality is crucial.

2.3 CRDTs (Conflict-Free Replicated Data Types)

Conflict-Free Replicated Data Types represent an elegant solution that eliminates conflicts by design rather than resolving them after they occur.

Core Principle: CRDTs are mathematically designed data structures that guarantee all replicas will converge to the same state regardless of the order in which operations are applied or network conditions.

CRDT Categories:

State-based CRDTs (CvRDTs):

Replicas exchange complete state information
Require a merge function that is commutative, associative, and idempotent
Suitable for scenarios with reliable state transfer

Operation-based CRDTs (CmRDTs):

Replicas exchange operations rather than states
Operations must be commutative
Require reliable operation delivery

Common CRDT Types:

G-Counter (Grow-only Counter):

Structure: Map of node_id → count
Operations: increment(node_id, amount)
Merge: sum all counts for each node
Example:
Node A: {A: 5, B: 3} = 8 total
Node B: {A: 3, B: 7} = 10 total
Merged: {A: 5, B: 7} = 12 total

OR-Set (Observed-Remove Set):

Tracks both additions and removals with unique identifiers
Addition wins over removal for concurrent operations
Ensures eventual consistency for set operations

Advantages:

Automatic conflict resolution without custom logic
Strong eventual consistency guarantees
Eliminates the complexity of manual conflict handling

Disadvantages:

Limited to specific data structure patterns
May require more storage for metadata
Complex to design for arbitrary data types

Use Cases: Collaborative editing applications, distributed counters, shopping carts in e-commerce, and any scenario requiring automatic conflict resolution.

2.4 Custom Application Logic

For complex business scenarios, custom conflict resolution strategies provide the flexibility to implement domain-specific rules that standard approaches cannot handle.

Design Considerations:

Business rule encoding: Translate business requirements into conflict resolution algorithms
User intervention: Allow manual resolution for critical conflicts
Contextual merging: Use application context to make intelligent merge decisions

Implementation Patterns:

Three-way Merge:

Original State: quantity = 100
Node A Update: quantity = 95  (sold 5 items)
Node B Update: quantity = 90  (sold 10 items)

Custom Logic: 
- Detect concurrent sales operations
- Apply both sales: 100 - 5 - 10 = 85
- Final quantity = 85

Priority-based Resolution:

Assign priorities to different types of operations
Higher priority operations override lower priority ones
Useful for hierarchical systems or role-based applications

User-mediated Resolution:

Present conflicts to users for manual resolution
Suitable for high-value data where automated resolution is insufficient
Combines human judgment with system automation

Advantages:

Tailored to specific business requirements
Can incorporate complex domain knowledge
Flexible and adaptable to changing requirements

Disadvantages:

Increased development and maintenance complexity
Potential for introducing bugs in the resolution logic
May require extensive testing for edge cases

Use Cases: Financial transaction systems, inventory management with complex business rules, document management systems, and applications where domain expertise is crucial for conflict resolution.

Choosing the Right Strategy

The selection of appropriate conflict resolution strategies depends on several factors:

Data criticality: How important is each piece of data to business operations?
Conflict frequency: How often do conflicts occur in the system?
User tolerance: Can users accept temporary inconsistencies?
Business rules: Are there specific domain requirements for handling conflicts?
System complexity: What level of implementation complexity is acceptable?

← Leader Election Performance Optimization →

Track your progress

Mark this subtopic as completed when you finish reading.