Greedy Algorithms

Greedy Algorithms are algorithms that build up a solution piece by piece, always choosing the next piece that offers the most immediate benefit. Greedy algorithms work by making locally optimal choices at each step with the hope of finding a global optimum. They are widely used in optimization problems but may not always provide the optimal solution for all types of problems. However, in certain problems, they guarantee optimality.

Two classic examples of greedy algorithms are Activity Selection and Huffman Coding.

Activity Selection Problem

The Activity Selection Problem is a classic example of a greedy algorithm used to select the maximum number of activities that do not overlap. Given a set of activities with their start and end times, the goal is to schedule the maximum number of activities without overlapping.

Greedy Strategy

Sort all activities by their end times.

  • Always select the activity that finishes the earliest and is compatible with previously selected activities.
  • This strategy ensures the maximum number of non-overlapping activities are selected, as choosing the earliest finishing activity leaves the most room for other activities.

Steps for Solving the Activity Selection Problem

Sort the activities by their end times.

  1. Initialize the last_selected_activity as the first activity.
  2. For each activity: If the start time of the current activity is greater than or equal to the end time of the last_selected_activity, select it.
  3. Update the last_selected_activity to the current activity each time one is selected.

Code Example

from typing import List, Tuple

def activity_selection(activities: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
    """
    Select the maximum number of non-overlapping activities.
    - activities: List of tuples where each tuple represents (start_time, end_time)
    Returns the list of selected activities.
    """
    # Sort activities by their end times
    sorted_activities = sorted(activities, key=lambda x: x[1])
    # Initialize the list of selected activities
    selected_activities = []
    last_end_time = -1
    
    for activity in sorted_activities:
        start, end = activity
        if start >= last_end_time:
            selected_activities.append(activity)
            last_end_time = end
    return selected_activities

# Example usage
activities = [(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10), (8, 11), (8, 12), (2, 13), (12, 14)]
selected = activity_selection(activities)

print("Selected activities:", selected)
# Output: Selected activities: [(1, 4), (5, 7), (8, 11), (12, 14)]

Explanation

Sorting: Activities are sorted by their end times to ensure the algorithm always considers the earliest finishing activity next.

  1. Selection: Activities are selected based on whether they start after or when the previous activity ends.
  2. Optimal Solution: By selecting the earliest ending activities, this greedy approach guarantees the maximum number of non-overlapping activities.

Huffman Coding

Huffman Coding is a greedy algorithm used for lossless data compression. It assigns variable-length codes to each symbol in such a way that symbols with higher frequencies are assigned shorter codes, and symbols with lower frequencies get longer codes. This minimizes the total number of bits required to represent a given set of data.

Greedy Strategy

  • Build a Huffman tree by repeatedly combining the two nodes with the lowest frequencies.
  • Assign codes based on the position of symbols in the tree: moving left represents a 0 and moving right represents a 1.

Steps for Huffman Coding

Calculate the frequency of each symbol in the data.

  1. Build a priority queue (min-heap) where each node represents a symbol and its frequency.
  2. While there is more than one node in the queue:
  • Remove the two nodes with the lowest frequency.
  • Create a new internal node with these two nodes as children, and set its frequency as the sum of their frequencies.
  • Add the new node back to the queue.
  1. The root of the final tree represents the Huffman tree.
  2. Traverse the tree to assign binary codes to each symbol based on the path from the root.

Code Example

import heapq
from typing import Dict, Optional

class HuffmanNode:
    def __init__(self, char: Optional[str], freq: int):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None
    
    # Define comparison operators for priority queue
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(freq_map: Dict[str, int]) -> HuffmanNode:
    """
    Build the Huffman Tree using the frequencies of each character.
    - freq_map: Dictionary where keys are characters and values are frequencies
    Returns the root of the Huffman Tree.
    """
    # Step 1: Initialize the priority queue with all characters
    heap = [HuffmanNode(char, freq) for char, freq in freq_map.items()]
    heapq.heapify(heap)
    # Step 2: Build the Huffman Tree
    while len(heap) > 1:
        node1 = heapq.heappop(heap)
        node2 = heapq.heappop(heap)
        merged = HuffmanNode(None, node1.freq + node2.freq)
        merged.left = node1
        merged.right = node2
        heapq.heappush(heap, merged)
    return heap[0]

def generate_huffman_codes(root: HuffmanNode, code: str, code_map: Dict[str, str]) -> None:
    """
    Generate Huffman Codes by traversing the Huffman Tree.
    - root: The root of the Huffman Tree
    - code: The current Huffman code (built recursively)
    - code_map: Dictionary to store character to Huffman code mappings
    """
    if root is None:
        return
    # If it's a leaf node, store the code for this character
    if root.char is not None:
        code_map[root.char] = code
        return
    # Traverse left and right subtrees
    generate_huffman_codes(root.left, code + "0", code_map)
    generate_huffman_codes(root.right, code + "1", code_map)

# Example usage
freq_map = {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}
root = build_huffman_tree(freq_map)

code_map = {}
generate_huffman_codes(root, "", code_map)
print("Huffman Codes:", code_map)
# Output: Huffman Codes: {'f': '0', 'c': '100', 'd': '101', 'a': '1100', 'b': '1101', 'e': '111'}

Explanation

Building the Tree: We build the Huffman Tree by combining nodes with the lowest frequencies, which ensures that frequently used characters appear higher in the tree and get shorter codes.

  1. Generating Codes: The codes are assigned based on the path from the root to each character, with left moves represented by 0 and right moves by 1.
  2. Optimal Solution: This greedy approach minimizes the total bit-length of the encoded message, as symbols with higher frequencies get shorter codes.

Applications of Huffman Coding

  1. Data Compression: Widely used in file compression algorithms (e.g., ZIP files).
  2. Transmission: Reduces the size of data for transmission in networks, thus saving bandwidth.

Advantages and Limitations of Greedy Algorithms

Advantages:

  1. Efficiency: Greedy algorithms are usually faster because they make only one pass through the data.
  2. Simplicity: Easy to implement since they focus on a single, local optimum.

Limitations:

  1. May Not Be Globally Optimal: Greedy algorithms do not guarantee a globally optimal solution for all problems. They only work if the local optimum leads to the global optimum.
  2. Limited Applicability: Greedy algorithms are only applicable for problems where the greedy choice property holds (i.e., local optimum leads to global optimum).

Track your progress

Mark this subtopic as completed when you finish reading.