Data Manipulation Language

Data Manipulation Language (DML) is the subset of SQL that deals with accessing, retrieving, and modifying data within database tables. While DDL (Data Definition Language) works with the structure of database objects, DML works with the actual data content stored within those structures.

Key Characteristics of DML:

  • Data-Focused Operations: Works with rows and columns of data rather than table structures
  • Transaction Support: DML operations can be grouped into transactions that can be committed or rolled back
  • Query Capabilities: Provides powerful filtering, sorting, and aggregation capabilities
  • Data Integrity: Maintains referential integrity and constraint validation during data modifications

Why DML Matters:

  • Business Operations: Every application interaction with data uses DML commands
  • Data Analysis: Enables complex data retrieval and analysis operations
  • Data Maintenance: Provides tools for keeping data accurate and up-to-date
  • Performance Impact: Well-written DML queries are crucial for application performance

DML vs DDL Quick Comparison:

  • DDL: Creates tables, DML uses those tables
  • DDL: Defines structure, DML manipulates content
  • DDL: Usually auto-commits, DML can be transaction-controlled
  • DDL: Changes schema, DML changes data

SELECT Statement

The SELECT statement is the most frequently used SQL command, designed to retrieve data from one or more tables. Think of it as asking questions to your database and getting structured answers back.

Basic SELECT Structure

Purpose: Extract specific information from database tables based on defined criteria.

Fundamental SELECT Components: Every SELECT statement follows a logical flow that determines what data to retrieve, from where, and how to present it.

Simple Data Retrieval:

SELECT first_name, last_name, email
FROM customers
WHERE city = 'New York';

Understanding This Query:

  • SELECT clause: Specifies which columns to retrieve (first_name, last_name, email)
  • FROM clause: Identifies the source table (customers)
  • WHERE clause: Filters rows based on conditions (only customers in New York)

Retrieving All Columns:

SELECT * 
FROM products 
WHERE price > 100;

The asterisk (*) is a wildcard that retrieves all columns from the table. While convenient for exploration, it’s generally not recommended in production applications because it can impact performance and makes code less maintainable.

Advanced SELECT Features

Column Aliases and Calculations

Purpose: Rename columns in output or perform calculations on data.

Using Column Aliases:

SELECT 
    first_name AS 'First Name',
    last_name AS 'Last Name',
    email AS 'Email Address',
    CONCAT(first_name, ' ', last_name) AS 'Full Name'
FROM customers;

Mathematical Calculations:

SELECT 
    product_name,
    price AS 'Original Price',
    price * 0.10 AS 'Tax Amount',
    price + (price * 0.10) AS 'Total Price',
    ROUND(price * 0.85, 2) AS 'Discounted Price'
FROM products;

Business Calculations Example:

SELECT 
    employee_name,
    base_salary,
    commission_rate,
    base_salary + (base_salary * commission_rate) AS total_compensation,
    CASE 
        WHEN base_salary > 75000 THEN 'Senior Level'
        WHEN base_salary > 50000 THEN 'Mid Level'
        ELSE 'Entry Level'
    END AS salary_grade
FROM employees;

Filtering Data with WHERE

Purpose: Apply conditions to limit which rows are returned.

Comparison Operators in Action:

-- Find high-value customers
SELECT customer_name, total_purchases
FROM customers
WHERE total_purchases >= 10000;

-- Find recent orders
SELECT order_id, order_date, total_amount
FROM orders
WHERE order_date > '2024-01-01' AND total_amount < 500;

Pattern Matching with LIKE:

-- Find customers whose names start with 'John'
SELECT customer_name, email
FROM customers
WHERE customer_name LIKE 'John%';

-- Find products containing 'phone' anywhere in the name
SELECT product_name, price
FROM products
WHERE product_name LIKE '%phone%';

-- Find customers with exactly 5-character first names
SELECT first_name, last_name
FROM customers
WHERE first_name LIKE '_____';

Working with Lists using IN:

-- Find orders from specific states
SELECT customer_name, state, order_total
FROM customer_orders
WHERE state IN ('California', 'Texas', 'Florida', 'New York');

-- Find products in specific categories
SELECT product_name, category, price
FROM products
WHERE category IN ('Electronics', 'Books', 'Clothing');

Range Queries with BETWEEN:

-- Find orders within a date range
SELECT order_id, order_date, customer_name
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';

-- Find products in a price range
SELECT product_name, price, category
FROM products
WHERE price BETWEEN 50 AND 200;

Handling NULL Values

Understanding NULL: NULL represents missing, unknown, or inapplicable data. It’s not zero, not an empty string, but truly “no value.”

NULL-Aware Queries:

-- Find customers with missing phone numbers
SELECT customer_name, email, phone
FROM customers
WHERE phone IS NULL;

-- Find customers with complete contact information
SELECT customer_name, email, phone, address
FROM customers
WHERE phone IS NOT NULL 
  AND email IS NOT NULL 
  AND address IS NOT NULL;

Working with NULL in Calculations:

-- Handle NULL values in calculations
SELECT 
    employee_name,
    base_salary,
    COALESCE(bonus, 0) AS bonus_amount,
    base_salary + COALESCE(bonus, 0) AS total_pay
FROM employees;

Sorting Results with ORDER BY

Purpose: Control the sequence in which rows are returned.

Basic Sorting:

-- Sort customers alphabetically by last name
SELECT first_name, last_name, city
FROM customers
ORDER BY last_name;

-- Sort products by price from highest to lowest
SELECT product_name, price, category
FROM products
ORDER BY price DESC;

Multi-Column Sorting:

-- Sort by state first, then by city, then by customer name
SELECT customer_name, city, state, total_purchases
FROM customers
ORDER BY state, city, customer_name;

-- Sort by multiple criteria with different directions
SELECT product_name, category, price, stock_quantity
FROM products
ORDER BY category ASC, price DESC, stock_quantity ASC;

Sorting by Calculated Fields:

SELECT 
    customer_name,
    total_purchases,
    account_creation_date,
    total_purchases / DATEDIFF(CURRENT_DATE, account_creation_date) AS avg_monthly_spending
FROM customers
ORDER BY avg_monthly_spending DESC;

Limiting Results with LIMIT/TOP

Purpose: Control how many rows are returned, useful for pagination and performance.

Basic Row Limiting:

-- Get top 10 highest-spending customers
SELECT customer_name, total_purchases
FROM customers
ORDER BY total_purchases DESC
LIMIT 10;

-- Get a sample of products for review
SELECT product_name, price, category
FROM products
LIMIT 25;

Pagination with OFFSET:

-- Get customers 21-40 for page 3 of results (20 per page)
SELECT customer_name, email, registration_date
FROM customers
ORDER BY registration_date DESC
LIMIT 20 OFFSET 40;

Joining Multiple Tables

Purpose: Combine data from multiple related tables to create comprehensive result sets.

INNER JOIN

Definition: Returns only rows where there’s a match in both tables.

Basic INNER JOIN:

SELECT 
    c.customer_name,
    c.email,
    o.order_date,
    o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

Multi-Table INNER JOIN:

SELECT 
    c.customer_name,
    o.order_date,
    p.product_name,
    oi.quantity,
    oi.unit_price
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id;

LEFT JOIN (LEFT OUTER JOIN)

Definition: Returns all rows from the left table and matching rows from the right table. NULL values appear for non-matching right table columns.

Finding Customers Without Orders:

SELECT 
    c.customer_name,
    c.email,
    c.registration_date,
    COUNT(o.order_id) AS total_orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.email, c.registration_date
HAVING COUNT(o.order_id) = 0;

RIGHT JOIN and FULL OUTER JOIN

RIGHT JOIN: Returns all rows from the right table and matching rows from the left table. FULL OUTER JOIN: Returns all rows from both tables, with NULLs where there’s no match.

Aggregate Functions and GROUP BY

Purpose: Perform calculations across multiple rows to produce summary information.

Common Aggregate Functions

COUNT, SUM, AVG, MIN, MAX:

-- Sales summary by category
SELECT 
    category,
    COUNT(*) AS total_products,
    SUM(stock_quantity) AS total_inventory,
    AVG(price) AS average_price,
    MIN(price) AS lowest_price,
    MAX(price) AS highest_price
FROM products
GROUP BY category;

Customer Analysis:

-- Customer purchasing behavior
SELECT 
    c.customer_name,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS lifetime_value,
    AVG(o.total_amount) AS average_order_value,
    MIN(o.order_date) AS first_order_date,
    MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;

HAVING Clause

Purpose: Filter groups created by GROUP BY, similar to how WHERE filters individual rows.

Finding High-Value Customer Segments:

SELECT 
    state,
    COUNT(*) AS customer_count,
    AVG(total_purchases) AS avg_customer_value
FROM customers
GROUP BY state
HAVING COUNT(*) >= 100 AND AVG(total_purchases) > 5000
ORDER BY avg_customer_value DESC;

Subqueries and Advanced SELECT

Subqueries in WHERE Clause

Purpose: Use the result of one query as a condition for another query.

Finding Above-Average Performers:

-- Find products priced above the average
SELECT product_name, price, category
FROM products
WHERE price > (
    SELECT AVG(price) 
    FROM products
);

-- Find customers who placed orders in the last month
SELECT customer_name, email, total_purchases
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
);

Correlated Subqueries

Definition: Subqueries that reference columns from the outer query.

Finding Each Customer’s Largest Order:

SELECT 
    c.customer_name,
    o.order_date,
    o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount = (
    SELECT MAX(o2.total_amount)
    FROM orders o2
    WHERE o2.customer_id = c.customer_id
);

INSERT Statement

The INSERT statement adds new rows of data to existing tables. It’s like adding new records to a filing system – you need to provide all the required information in the correct format.

Basic INSERT Operations

Purpose: Create new records in database tables with specified data values.

Simple Single-Row Insert:

INSERT INTO customers (customer_name, email, phone, city, state)
VALUES ('John Smith', 'john.smith@email.com', '555-0123', 'Seattle', 'WA');

Understanding This Insert:

  • Table specification: customers table is the target
  • Column list: Explicitly names which columns will receive data
  • VALUES clause: Provides the actual data in the same order as columns
  • Data types: Each value must match the expected data type for its column

Insert with All Columns:

INSERT INTO products 
VALUES (1001, 'Wireless Headphones', 'Electronics', 149.99, 25, '2024-03-15', TRUE);

When providing values for all columns in table order, the column list can be omitted, but this practice is discouraged because it makes code fragile to table structure changes.

Multiple Row Insertions

Purpose: Efficiently add multiple records in a single statement.

Batch Insert Example:

INSERT INTO employees (first_name, last_name, department, hire_date, salary)
VALUES 
    ('Alice', 'Johnson', 'Engineering', '2024-01-15', 85000),
    ('Bob', 'Williams', 'Marketing', '2024-01-20', 65000),
    ('Carol', 'Davis', 'Sales', '2024-02-01', 55000),
    ('David', 'Miller', 'Engineering', '2024-02-10', 90000);

Benefits of Batch Inserts:

  • Performance: Faster than multiple single-row inserts
  • Transaction Efficiency: All rows inserted in one transaction
  • Reduced Network Overhead: Single round-trip to database
  • Atomicity: Either all rows insert successfully or none do

INSERT with SELECT (Data Migration)

Purpose: Copy data from existing tables or transform data during insertion.

Copying Customer Data to Archive:

INSERT INTO customers_archive (customer_id, customer_name, email, total_purchases, archive_date)
SELECT 
    customer_id,
    customer_name,
    email,
    total_purchases,
    CURRENT_DATE
FROM customers
WHERE last_order_date < '2022-01-01';

Creating Summary Records:

INSERT INTO monthly_sales_summary (year_month, total_orders, total_revenue, avg_order_value)
SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS year_month,
    COUNT(*) AS total_orders,
    SUM(total_amount) AS total_revenue,
    AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m');

Handling Default Values and Auto-Increment

Working with Auto-Increment Columns:

-- Let database generate the ID automatically
INSERT INTO orders (customer_id, order_date, total_amount, status)
VALUES (1023, '2024-03-15', 299.99, 'Pending');

Using Default Values:

-- Explicit default value usage
INSERT INTO products (product_name, category, price, in_stock, created_date)
VALUES ('New Product', 'Electronics', 199.99, DEFAULT, DEFAULT);

-- Columns with defaults can be omitted
INSERT INTO products (product_name, category, price)
VALUES ('Another Product', 'Books', 29.99);

INSERT with Duplicate Handling

ON DUPLICATE KEY UPDATE (MySQL):

INSERT INTO product_inventory (product_id, quantity_on_hand, last_updated)
VALUES (1001, 50, NOW())
ON DUPLICATE KEY UPDATE 
    quantity_on_hand = quantity_on_hand + VALUES(quantity_on_hand),
    last_updated = NOW();

INSERT IGNORE (MySQL):

-- Skip rows that would cause duplicate key errors
INSERT IGNORE INTO customers (customer_id, customer_name, email)
VALUES 
    (1, 'John Doe', 'john@email.com'),
    (2, 'Jane Smith', 'jane@email.com');

UPDATE Statement

The UPDATE statement modifies existing data in database tables. It’s like editing information in existing files – you can change specific pieces of information while keeping the rest intact.

Basic UPDATE Operations

Purpose: Modify existing records by changing values in specified columns.

Simple Column Update:

UPDATE customers 
SET phone = '555-0199' 
WHERE customer_id = 1001;

Understanding This Update:

  • Target table: customers table contains the records to modify
  • SET clause: Specifies which columns to change and their new values
  • WHERE clause: Identifies which specific rows to update (crucial for preventing unintended changes)

Multiple Column Update:

UPDATE employees 
SET salary = 92000, 
    department = 'Senior Engineering',
    last_promotion_date = '2024-03-15'
WHERE employee_id = 2025;

Conditional Updates

Using CASE Statements for Complex Logic:

UPDATE products 
SET discount_percentage = 
    CASE 
        WHEN category = 'Electronics' THEN 15
        WHEN category = 'Clothing' THEN 25
        WHEN category = 'Books' THEN 10
        ELSE 5
    END,
    sale_end_date = '2024-04-30'
WHERE in_stock = TRUE;

Calculated Updates:

-- Apply percentage-based salary increases
UPDATE employees 
SET salary = salary * 1.05,  -- 5% increase
    last_review_date = CURRENT_DATE
WHERE performance_rating >= 4.0 
  AND last_promotion_date < '2023-01-01';

Updates Based on Other Tables (JOIN Updates)

Purpose: Update records based on information from related tables.

Updating Customer Status Based on Order History:

UPDATE customers c
INNER JOIN (
    SELECT customer_id, 
           SUM(total_amount) as lifetime_value,
           COUNT(*) as order_count
    FROM orders 
    WHERE order_date >= '2023-01-01'
    GROUP BY customer_id
) order_stats ON c.customer_id = order_stats.customer_id
SET c.customer_tier = 
    CASE 
        WHEN order_stats.lifetime_value > 10000 THEN 'Platinum'
        WHEN order_stats.lifetime_value > 5000 THEN 'Gold'
        WHEN order_stats.lifetime_value > 1000 THEN 'Silver'
        ELSE 'Bronze'
    END,
    c.total_orders = order_stats.order_count;

Inventory Updates Based on Sales:

UPDATE products p
INNER JOIN (
    SELECT product_id, SUM(quantity) as total_sold
    FROM order_items oi
    INNER JOIN orders o ON oi.order_id = o.order_id
    WHERE o.order_date = CURRENT_DATE
    GROUP BY product_id
) daily_sales ON p.product_id = daily_sales.product_id
SET p.stock_quantity = p.stock_quantity - daily_sales.total_sold,
    p.last_sale_date = CURRENT_DATE;

Bulk Updates and Data Maintenance

Batch Status Updates:

-- Mark old orders as archived
UPDATE orders 
SET status = 'Archived',
    archived_date = CURRENT_DATE
WHERE order_date < '2023-01-01' 
  AND status = 'Delivered';

Data Cleanup Operations:

-- Standardize phone number formatting
UPDATE customers 
SET phone = CONCAT(
    '(',
    SUBSTRING(REGEXP_REPLACE(phone, '[^0-9]', ''), 1, 3),
    ') ',
    SUBSTRING(REGEXP_REPLACE(phone, '[^0-9]', ''), 4, 3),
    '-',
    SUBSTRING(REGEXP_REPLACE(phone, '[^0-9]', ''), 7, 4)
)
WHERE phone IS NOT NULL 
  AND LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '')) = 10;

DELETE Statement

The DELETE statement removes existing rows from database tables. It’s like removing specific files from a filing system – once deleted, the information is gone (unless you have backups).

Basic DELETE Operations

Purpose: Remove unwanted or outdated records from database tables.

Simple Row Deletion:

DELETE FROM customers 
WHERE customer_id = 1001;

Understanding This Delete:

  • Target table: customers table contains records to be removed
  • WHERE clause: Absolutely critical – specifies which rows to delete
  • No column specification: DELETE removes entire rows, not individual column values

Conditional Deletion:

-- Remove inactive accounts
DELETE FROM user_accounts 
WHERE last_login_date < '2022-01-01' 
  AND account_status = 'Inactive';

DELETE with Joins (Complex Conditions)

Purpose: Remove records based on conditions involving multiple tables.

Removing Orphaned Records:

-- Delete order items for cancelled orders
DELETE oi
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE o.status = 'Cancelled' 
  AND o.order_date < '2024-01-01';

Cleanup Based on Related Data:

-- Remove customer preferences for customers who haven't placed orders
DELETE cp
FROM customer_preferences cp
LEFT JOIN orders o ON cp.customer_id = o.customer_id
WHERE o.customer_id IS NULL;

Cascading Deletes and Referential Integrity

Understanding Foreign Key Constraints: When tables have foreign key relationships, deletions must respect referential integrity.

Parent-Child Relationship Example:

-- This might fail if there are related order_items
DELETE FROM orders WHERE customer_id = 1001;

-- Safe approach: delete children first
DELETE FROM order_items WHERE order_id IN 
    (SELECT order_id FROM orders WHERE customer_id = 1001);
DELETE FROM orders WHERE customer_id = 1001;

CASCADE DELETE Setup (DDL for context):

-- Foreign key with cascade delete
ALTER TABLE order_items
ADD CONSTRAINT fk_order_cascade
FOREIGN KEY (order_id) REFERENCES orders(order_id)
ON DELETE CASCADE;

Bulk Deletion and Archival

Large-Scale Data Removal:

-- Remove old log entries (be careful with large tables)
DELETE FROM system_logs 
WHERE log_date < DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY);

-- Delete in smaller batches for large tables
DELETE FROM large_transaction_table 
WHERE transaction_date < '2023-01-01' 
LIMIT 1000;

Archive Before Delete Pattern:

-- Step 1: Archive old data
INSERT INTO orders_archive 
SELECT * FROM orders 
WHERE order_date < '2023-01-01';

-- Step 2: Delete archived data
DELETE FROM orders 
WHERE order_date < '2023-01-01';

DELETE vs TRUNCATE vs DROP

Understanding the Differences:

DELETE:

  • Removes specific rows based on conditions
  • Can be rolled back in transactions
  • Fires triggers
  • Slower for large datasets
  • Maintains table structure and relationships

TRUNCATE:

  • Removes all rows quickly
  • Cannot use WHERE clause
  • Resets auto-increment counters
  • Faster than DELETE for entire tables
  • May not be rollbackable

DROP:

  • Removes entire table structure and data
  • Cannot be undone without backups
  • Breaks relationships with other tables
  • DDL operation, not DML

DML Best Practices and Performance Considerations

Transaction Management

Understanding Transactions in DML:

-- Grouping related operations
START TRANSACTION;

UPDATE accounts 
SET balance = balance - 1000 
WHERE account_id = 'ACC001';

UPDATE accounts 
SET balance = balance + 1000 
WHERE account_id = 'ACC002';

INSERT INTO transaction_log (from_account, to_account, amount, transaction_date)
VALUES ('ACC001', 'ACC002', 1000, NOW());

COMMIT;

Rollback on Error:

START TRANSACTION;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 1001;

-- Check if update was successful
IF (SELECT quantity FROM inventory WHERE product_id = 1001) < 0 THEN
    ROLLBACK;
ELSE
    COMMIT;
END IF;

Performance Optimization

Efficient WHERE Clauses:

  • Use indexed columns in WHERE conditions
  • Avoid functions on columns in WHERE clauses
  • Use specific conditions rather than broad ranges when possible

Batch Processing:

-- Instead of many single-row operations
-- Process in batches for better performance
UPDATE products 
SET last_updated = NOW() 
WHERE category = 'Electronics' 
  AND product_id BETWEEN 1000 AND 2000;

Data Integrity and Safety

Always Use WHERE in UPDATE/DELETE:

-- Dangerous: Updates all rows
UPDATE products SET price = 0;

-- Safe: Updates specific rows
UPDATE products SET price = 0 WHERE discontinued = TRUE;

Validate Before Bulk Operations:

-- Check what will be affected before deleting
SELECT COUNT(*) FROM old_records WHERE last_access < '2022-01-01';

-- Then perform the actual delete
DELETE FROM old_records WHERE last_access < '2022-01-01';

Monitoring and Auditing

Tracking Data Changes:

-- Update with audit trail
UPDATE customer_profiles 
SET email = 'new.email@example.com',
    last_modified = NOW(),
    modified_by = 'admin_user'
WHERE customer_id = 1001;

Understanding DML operations is fundamental to working effectively with databases. These commands form the core of most database interactions and require careful consideration of performance, data integrity, and business logic to implement successfully.

Track your progress

Mark this subtopic as completed when you finish reading.