Topics

Horizontal and Vertical Scaling

Scaling refers to the ability of a system to handle increased load. There are two primary approaches:

Vertical Scaling (Scaling Up): Increase the power of a single machine.
Horizontal Scaling (Scaling Out): Increase the number of machines handling the load.

1. Horizontal vs. Vertical Scaling

1.1 Vertical Scaling (Scaling Up)

Definition: Enhancing the capacity of a single server by adding more powerful hardware — CPU, RAM, storage, etc.

How Vertical Scaling Works:

Upgrading Hardware: e.g., more RAM, a faster CPU.
Cloud Scaling: e.g., moving from an AWS t2.medium to t2.2xlarge.

Diagram (Vertical Scaling):

+------------------+      +-------------------+
|      Server      | ---> |  Upgraded Server  |
|   (4 GB RAM)     |      |    (16 GB RAM)    |
+------------------+      +-------------------+

Advantages:

Simplicity: No need to modify code for distribution.
Consistency: One machine, no distributed state.

Disadvantages:

Hardware Limitations: Limited upgrade potential.
Single Point of Failure: One machine \= one failure point.
Cost: High-end servers get expensive fast.

Use Cases:

Small internal tools.
Legacy applications.
Databases that benefit from local high-speed memory access.

Python Example:

Here’s a conceptual simulation of how vertical scaling might improve performance in a CPU-bound task:

import time
from typing import Callable

def compute_heavy_task(multiplier: int) -> int:
    result = 0
    for i in range(10**6):
        result += i * multiplier
    return result

def run_on_scaled_machine(cpu_speed: float, task: Callable[[int], int]) -> None:
    start = time.time()
    result = task(int(cpu_speed * 100))
    duration = time.time() - start
    print(f"Result: {result}, Duration: {duration:.2f}s")

# Simulating performance improvement due to vertical scaling (faster CPU)
run_on_scaled_machine(cpu_speed=1.0, task=compute_heavy_task)  # Normal
run_on_scaled_machine(cpu_speed=2.0, task=compute_heavy_task)  # Scaled up

1.2 Horizontal Scaling (Scaling Out)

Definition: Adding more servers (nodes) to the system to share the load.

How Horizontal Scaling Works:

Load Balancing: Distributes requests among multiple servers.
Replication: Duplicate data across nodes for redundancy.
Partitioning (Sharding): Split data across multiple nodes.
Microservices: Each service scales independently.

Diagram (Horizontal Scaling):

+-----------+    +-----------+    +-----------+
|   Server 1|    |   Server 2|    |   Server 3|
+-----------+    +-----------+    +-----------+
      |               |               |
      +---------------+---------------+
                      |
             +-------------------+
             |   Load Balancer   |
             +-------------------+

Advantages:

Fault Tolerance: One server down doesn’t bring down the system.
Elastic Scalability: Easy to add/remove servers.
Cost-Effective: Many small servers vs. one massive one.

Disadvantages:

Complex Setup: Load balancers, service discovery, etc.
Consistency Trade-offs: CAP theorem applies.
Code Changes: May require refactoring for statelessness.

Use Cases:

Social media platforms.
E-commerce websites.
Microservices and distributed databases.

Python Example (Simulated Load Distribution):

from typing import List
import random
import time

def simulate_request(server_id: int, payload: str) -> None:
    time.sleep(random.uniform(0.01, 0.1))  # Simulate network latency
    print(f"Server {server_id} processed payload: {payload}")

def load_balancer(servers: List[int], requests: List[str]) -> None:
    for i, req in enumerate(requests):
        chosen_server = servers[i % len(servers)]
        simulate_request(chosen_server, req)

server_pool = [1, 2, 3]  # Horizontal scale

incoming_requests = [f"Request-{i}" for i in range(10)]
load_balancer(server_pool, incoming_requests)

2. Scaling Strategies and When to Use Each

2.1 Vertical Scaling Strategy

Use Vertical Scaling When:

The application runs on a monolithic architecture.
You need low-latency, consistent performance.
Cost or design constraints prevent a distributed model.
You’re in early development with a predictable load.

Example Scenario:

A startup uses a Python Flask app to manage internal tasks like inventory and accounting. Traffic is predictable. Instead of re-architecting, they increase RAM from 4 GB to 16 GB and CPU cores from 2 to 8.

Before: 2-core CPU, 4 GB RAM
After:  8-core CPU, 16 GB RAM

2.2 Horizontal Scaling Strategy

Use Horizontal Scaling When:

You expect or already have high/variable traffic.
Application must be highly available.
You are using a distributed or microservices-based architecture.
You need elastic growth without downtime.

Example Scenario:

A microservices-based social media app:

Services:
- Auth Service
- Feed Service
- Notification Service

Each runs on its own cluster of containers and scales independently. Traffic to the feed service spikes during major events, so only that service is scaled out.

+-----------------------------+
|         Load Balancer       |
+--------+--------+-----------+
         |        |           
   +-----+--+ +----+-----+ +----+-----+
   | Feed 1 | | Feed 2  | | Feed 3  |
   +--------+ +---------+ +---------+

Combining Vertical and Horizontal Scaling (Hybrid Approach)

In practice, many systems use both:

Vertical for databases needing I/O throughput and consistency.
Horizontal for web/app layers needing elasticity.

Hybrid Scaling Example:

Use a powerful server (vertical scaling) to host PostgreSQL.
Deploy 5 small app servers (horizontal scaling) behind a load balancer for the frontend/API layer.

Frontend/API Layer (Horizontally Scaled)
+--------+  +--------+  +--------+
| App 1  |  | App 2  |  | App 3  |
+--------+  +--------+  +--------+
     \        |         /
      \_______|________/
              |
     +-----------------+
     | Load Balancer   |
     +-----------------+
Database Layer (Vertically Scaled)
+-------------------------+
| PostgreSQL (64 GB RAM)  |
+-------------------------+

By understanding both scaling strategies and when to use them, you can design systems that meet both short-term and long-term performance, availability, and cost goals.

← Sharding and Partitioning High Availability and Failover →

Track your progress

Mark this subtopic as completed when you finish reading.