Data Storage Fundamentals

1. Types of Storage (Block, Object, File)

Understanding the differences between block, object, and file storage is crucial for designing systems that meet various data access and performance requirements.

1.1 Block Storage

Block storage divides data into fixed-size blocks and stores them in a storage device (e.g., HDD, SSD).

  • Usage: Typically used in SAN (Storage Area Networks) and preferred for databases and VMs.
  • Access: Data is stored in chunks (blocks) and retrieved using block addresses.
  • Example: AWS EBS (Elastic Block Storage).

1.2 Object Storage

Object storage manages data as individual objects, each containing data, metadata, and a unique identifier.

  • Usage: Ideal for unstructured data like images, videos, backups.
  • Access: Accessed via unique IDs; no file hierarchy.
  • Example: Amazon S3, Azure Blob Storage.

Example of Object Storage URL:

https://bucket-name.s3.amazonaws.com/object-key

1.3 File Storage

File storage organizes data in a hierarchy of directories and files, similar to a traditional file system.

  • Usage: Suitable for applications needing a file structure, such as content management systems.
  • Access: Accessed using file paths; supports permissions and locking.
  • Example: Network Attached Storage (NAS), Google Drive.

2. SQL vs. NoSQL Databases

SQL and NoSQL databases have distinct use cases, strengths, and weaknesses. Understanding these is essential for designing systems that are scalable and flexible.

2.1 SQL Databases

SQL databases are relational databases that use structured query language (SQL) to define and manipulate data.

  • Structure: Relational (tables with rows and columns).
  • ACID Compliance: Ensures data reliability (Atomicity, Consistency, Isolation, Durability).
  • Example: MySQL, PostgreSQL, Oracle.

Example SQL Query:

SELECT name, age FROM users WHERE age > 25;

2.2 NoSQL Databases

NoSQL databases are non-relational and are designed to handle large volumes of unstructured data. They provide flexibility and scalability, often at the expense of strict consistency.

  • Types: Key-value, Document, Columnar, Graph.
  • BASE Compliance: Emphasizes availability and partition tolerance (Basically Available, Soft state, Eventual consistency).
  • Example: MongoDB (Document), Cassandra (Columnar), Redis (Key-value).

3. Key-value Stores, Document Stores, Graph Databases, Columnar Databases

3.1 Key-value Stores

Key-value stores store data as pairs of keys and values, optimized for rapid lookups.

  • Usage: Ideal for caching and session management.
  • Example: Redis, DynamoDB.

Example Code (Using Redis in Python):

import redis

r = redis.StrictRedis(host='localhost', port=6379, db=0)
r.set('user:1000', '{"name": "Alice", "age": 30}')

print(r.get('user:1000'))  # Output: b'{"name": "Alice", "age": 30}'

3.2 Document Stores

Document stores manage semi-structured data as JSON/BSON documents, providing flexibility in data schema.

  • Usage: Suitable for applications that store hierarchical data (e.g., user profiles).
  • Example: MongoDB, CouchDB.

Example JSON Document in MongoDB:

{
  "user_id": "1001",
  "name": "John Doe",
  "address": {
    "street": "123 Elm St",
    "city": "Somewhere"
  }
}

3.3 Graph Databases

Graph databases store data as nodes (entities) and edges (relationships), ideal for handling complex, interconnected data.

  • Usage: Social networks, recommendation engines.
  • Example: Neo4j, Amazon Neptune.

Common Cypher Query in Neo4j:

MATCH (user:Person {name: 'Alice'})-[:FRIEND]->(friend)
RETURN friend.name;

3.4 Columnar Databases

Columnar databases store data by columns rather than rows, optimizing for analytical queries that need to read large datasets quickly.

  • Usage: Data warehousing and OLAP applications.
  • Example: Apache Cassandra, Google Bigtable.

Example CQL Query for Cassandra:

SELECT name FROM users WHERE age > 25;

Track your progress

Mark this subtopic as completed when you finish reading.