Multithreading & Multiprocessing

In Python, multithreading and multiprocessing are two techniques used to perform tasks concurrently. They can help optimize performance for different types of problems, such as I/O-bound and CPU-bound tasks. However, they each come with specific behaviors and limitations, particularly due to Python’s Global Interpreter Lock (GIL), which affects how threads execute.

1. Multithreading in Python

Multithreading allows multiple threads to be created within a single process. Threads are lightweight, sharing the same memory space, which makes it easier to share data but also requires synchronization to prevent data corruption. In Python, the GIL limits multithreading performance for CPU-bound tasks but is still useful for I/O-bound tasks, where threads can run while waiting for I/O operations to complete.

When to Use Multithreading

  • I/O-Bound Tasks: Ideal for tasks that spend a lot of time waiting for input/output, such as file operations, network requests, or database queries.
  • Lightweight Concurrency: Threads are lightweight compared to processes, making them faster to start and more memory-efficient.

Example: Using threading Module

import threading
import time

def print_numbers():
    for i in range(1, 6):
        print(f"Number: {i}")
        time.sleep(1)

def print_letters():
    for letter in 'abcde':
        print(f"Letter: {letter}")
        time.sleep(1)

# Creating threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

# Starting threads
thread1.start()
thread2.start()

# Wait for both threads to complete
thread1.join()
thread2.join()
print("Both threads have finished executing.")

Explanation

  1. Creating Threads: We create two threads, each running a different function.
  2. Starting Threads: We start the threads with .start(), which runs them concurrently.
  3. Joining Threads: We wait for both threads to complete with .join().

Output

  • Both functions run concurrently, so numbers and letters are printed interleaved.

Limitations

  • GIL: Python’s GIL prevents multiple threads from executing Python bytecodes simultaneously. This can cause a bottleneck for CPU-bound tasks.
  • Data Synchronization: Because threads share memory, care must be taken to avoid race conditions. The threading module provides synchronization primitives like Lock, RLock, Semaphore, and Event.

2. Multiprocessing in Python

Multiprocessing uses multiple processes instead of threads. Each process has its own memory space, and they run independently of each other. The GIL does not affect multiprocessing since each process has its own Python interpreter and memory space. Multiprocessing is ideal for CPU-bound tasks, as it can leverage multiple CPU cores to run tasks in parallel.

When to Use Multiprocessing

  • CPU-Bound Tasks: Tasks that require a lot of CPU processing, like mathematical computations, image processing, or data transformations.
  • Parallel Execution: Multiple processes run independently and in parallel, allowing true parallelism.

Example: Using multiprocessing Module

import multiprocessing
import time

def calculate_square(numbers):
    for n in numbers:
        time.sleep(1)
        print(f"Square of {n}: {n * n}")

def calculate_cube(numbers):
    for n in numbers:
        time.sleep(1)
        print(f"Cube of {n}: {n * n * n}")

# Create process objects
process1 = multiprocessing.Process(target=calculate_square, args=([1, 2, 3],))
process2 = multiprocessing.Process(target=calculate_cube, args=([1, 2, 3],))

# Start processes
process1.start()
process2.start()

# Wait for both processes to complete
process1.join()
process2.join()
print("Both processes have finished executing.")

Explanation

  1. Creating Processes: We create two processes, each with its own function and argument list.
  2. Starting Processes: We start both processes with .start().
  3. Joining Processes: We wait for both processes to complete with .join().

Output

  • The squares and cubes of numbers are calculated concurrently, and output is interleaved.

Advantages of Multiprocessing

  • True Parallelism: Multiprocessing avoids the GIL, so multiple CPU-bound tasks can run in parallel on multiple cores.
  • Separate Memory Spaces: Each process has its own memory space, avoiding memory corruption issues seen with threads.

Track your progress

Mark this subtopic as completed when you finish reading.