Data Manipulation Language (DML) is the subset of SQL that deals with accessing, retrieving, and modifying data within database tables. While DDL (Data Definition Language) works with the structure of database objects, DML works with the actual data content stored within those structures.
Key Characteristics of DML:
- Data-Focused Operations: Works with rows and columns of data rather than table structures
- Transaction Support: DML operations can be grouped into transactions that can be committed or rolled back
- Query Capabilities: Provides powerful filtering, sorting, and aggregation capabilities
- Data Integrity: Maintains referential integrity and constraint validation during data modifications
Why DML Matters:
- Business Operations: Every application interaction with data uses DML commands
- Data Analysis: Enables complex data retrieval and analysis operations
- Data Maintenance: Provides tools for keeping data accurate and up-to-date
- Performance Impact: Well-written DML queries are crucial for application performance
DML vs DDL Quick Comparison:
- DDL: Creates tables, DML uses those tables
- DDL: Defines structure, DML manipulates content
- DDL: Usually auto-commits, DML can be transaction-controlled
- DDL: Changes schema, DML changes data
SELECT Statement
The SELECT statement is the most frequently used SQL command, designed to retrieve data from one or more tables. Think of it as asking questions to your database and getting structured answers back.
Basic SELECT Structure
Purpose: Extract specific information from database tables based on defined criteria.
Fundamental SELECT Components: Every SELECT statement follows a logical flow that determines what data to retrieve, from where, and how to present it.
Simple Data Retrieval:
SELECT first_name, last_name, email
FROM customers
WHERE city = 'New York';
Understanding This Query:
- SELECT clause: Specifies which columns to retrieve (first_name, last_name, email)
- FROM clause: Identifies the source table (customers)
- WHERE clause: Filters rows based on conditions (only customers in New York)
Retrieving All Columns:
SELECT *
FROM products
WHERE price > 100;
The asterisk (*) is a wildcard that retrieves all columns from the table. While convenient for exploration, it’s generally not recommended in production applications because it can impact performance and makes code less maintainable.
Advanced SELECT Features
Column Aliases and Calculations
Purpose: Rename columns in output or perform calculations on data.
Using Column Aliases:
SELECT
first_name AS 'First Name',
last_name AS 'Last Name',
email AS 'Email Address',
CONCAT(first_name, ' ', last_name) AS 'Full Name'
FROM customers;
Mathematical Calculations:
SELECT
product_name,
price AS 'Original Price',
price * 0.10 AS 'Tax Amount',
price + (price * 0.10) AS 'Total Price',
ROUND(price * 0.85, 2) AS 'Discounted Price'
FROM products;
Business Calculations Example:
SELECT
employee_name,
base_salary,
commission_rate,
base_salary + (base_salary * commission_rate) AS total_compensation,
CASE
WHEN base_salary > 75000 THEN 'Senior Level'
WHEN base_salary > 50000 THEN 'Mid Level'
ELSE 'Entry Level'
END AS salary_grade
FROM employees;
Filtering Data with WHERE
Purpose: Apply conditions to limit which rows are returned.
Comparison Operators in Action:
-- Find high-value customers
SELECT customer_name, total_purchases
FROM customers
WHERE total_purchases >= 10000;
-- Find recent orders
SELECT order_id, order_date, total_amount
FROM orders
WHERE order_date > '2024-01-01' AND total_amount < 500;
Pattern Matching with LIKE:
-- Find customers whose names start with 'John'
SELECT customer_name, email
FROM customers
WHERE customer_name LIKE 'John%';
-- Find products containing 'phone' anywhere in the name
SELECT product_name, price
FROM products
WHERE product_name LIKE '%phone%';
-- Find customers with exactly 5-character first names
SELECT first_name, last_name
FROM customers
WHERE first_name LIKE '_____';
Working with Lists using IN:
-- Find orders from specific states
SELECT customer_name, state, order_total
FROM customer_orders
WHERE state IN ('California', 'Texas', 'Florida', 'New York');
-- Find products in specific categories
SELECT product_name, category, price
FROM products
WHERE category IN ('Electronics', 'Books', 'Clothing');
Range Queries with BETWEEN:
-- Find orders within a date range
SELECT order_id, order_date, customer_name
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
-- Find products in a price range
SELECT product_name, price, category
FROM products
WHERE price BETWEEN 50 AND 200;
Handling NULL Values
Understanding NULL: NULL represents missing, unknown, or inapplicable data. It’s not zero, not an empty string, but truly “no value.”
NULL-Aware Queries:
-- Find customers with missing phone numbers
SELECT customer_name, email, phone
FROM customers
WHERE phone IS NULL;
-- Find customers with complete contact information
SELECT customer_name, email, phone, address
FROM customers
WHERE phone IS NOT NULL
AND email IS NOT NULL
AND address IS NOT NULL;
Working with NULL in Calculations:
-- Handle NULL values in calculations
SELECT
employee_name,
base_salary,
COALESCE(bonus, 0) AS bonus_amount,
base_salary + COALESCE(bonus, 0) AS total_pay
FROM employees;
Sorting Results with ORDER BY
Purpose: Control the sequence in which rows are returned.
Basic Sorting:
-- Sort customers alphabetically by last name
SELECT first_name, last_name, city
FROM customers
ORDER BY last_name;
-- Sort products by price from highest to lowest
SELECT product_name, price, category
FROM products
ORDER BY price DESC;
Multi-Column Sorting:
-- Sort by state first, then by city, then by customer name
SELECT customer_name, city, state, total_purchases
FROM customers
ORDER BY state, city, customer_name;
-- Sort by multiple criteria with different directions
SELECT product_name, category, price, stock_quantity
FROM products
ORDER BY category ASC, price DESC, stock_quantity ASC;
Sorting by Calculated Fields:
SELECT
customer_name,
total_purchases,
account_creation_date,
total_purchases / DATEDIFF(CURRENT_DATE, account_creation_date) AS avg_monthly_spending
FROM customers
ORDER BY avg_monthly_spending DESC;
Limiting Results with LIMIT/TOP
Purpose: Control how many rows are returned, useful for pagination and performance.
Basic Row Limiting:
-- Get top 10 highest-spending customers
SELECT customer_name, total_purchases
FROM customers
ORDER BY total_purchases DESC
LIMIT 10;
-- Get a sample of products for review
SELECT product_name, price, category
FROM products
LIMIT 25;
Pagination with OFFSET:
-- Get customers 21-40 for page 3 of results (20 per page)
SELECT customer_name, email, registration_date
FROM customers
ORDER BY registration_date DESC
LIMIT 20 OFFSET 40;
Joining Multiple Tables
Purpose: Combine data from multiple related tables to create comprehensive result sets.
INNER JOIN
Definition: Returns only rows where there’s a match in both tables.
Basic INNER JOIN:
SELECT
c.customer_name,
c.email,
o.order_date,
o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
Multi-Table INNER JOIN:
SELECT
c.customer_name,
o.order_date,
p.product_name,
oi.quantity,
oi.unit_price
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id;
LEFT JOIN (LEFT OUTER JOIN)
Definition: Returns all rows from the left table and matching rows from the right table. NULL values appear for non-matching right table columns.
Finding Customers Without Orders:
SELECT
c.customer_name,
c.email,
c.registration_date,
COUNT(o.order_id) AS total_orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.email, c.registration_date
HAVING COUNT(o.order_id) = 0;
RIGHT JOIN and FULL OUTER JOIN
RIGHT JOIN: Returns all rows from the right table and matching rows from the left table. FULL OUTER JOIN: Returns all rows from both tables, with NULLs where there’s no match.
Aggregate Functions and GROUP BY
Purpose: Perform calculations across multiple rows to produce summary information.
Common Aggregate Functions
COUNT, SUM, AVG, MIN, MAX:
-- Sales summary by category
SELECT
category,
COUNT(*) AS total_products,
SUM(stock_quantity) AS total_inventory,
AVG(price) AS average_price,
MIN(price) AS lowest_price,
MAX(price) AS highest_price
FROM products
GROUP BY category;
Customer Analysis:
-- Customer purchasing behavior
SELECT
c.customer_name,
COUNT(o.order_id) AS total_orders,
SUM(o.total_amount) AS lifetime_value,
AVG(o.total_amount) AS average_order_value,
MIN(o.order_date) AS first_order_date,
MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;
HAVING Clause
Purpose: Filter groups created by GROUP BY, similar to how WHERE filters individual rows.
Finding High-Value Customer Segments:
SELECT
state,
COUNT(*) AS customer_count,
AVG(total_purchases) AS avg_customer_value
FROM customers
GROUP BY state
HAVING COUNT(*) >= 100 AND AVG(total_purchases) > 5000
ORDER BY avg_customer_value DESC;
Subqueries and Advanced SELECT
Subqueries in WHERE Clause
Purpose: Use the result of one query as a condition for another query.
Finding Above-Average Performers:
-- Find products priced above the average
SELECT product_name, price, category
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
);
-- Find customers who placed orders in the last month
SELECT customer_name, email, total_purchases
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
);
Correlated Subqueries
Definition: Subqueries that reference columns from the outer query.
Finding Each Customer’s Largest Order:
SELECT
c.customer_name,
o.order_date,
o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount = (
SELECT MAX(o2.total_amount)
FROM orders o2
WHERE o2.customer_id = c.customer_id
);
INSERT Statement
The INSERT statement adds new rows of data to existing tables. It’s like adding new records to a filing system – you need to provide all the required information in the correct format.
Basic INSERT Operations
Purpose: Create new records in database tables with specified data values.
Simple Single-Row Insert:
INSERT INTO customers (customer_name, email, phone, city, state)
VALUES ('John Smith', 'john.smith@email.com', '555-0123', 'Seattle', 'WA');
Understanding This Insert:
- Table specification: customers table is the target
- Column list: Explicitly names which columns will receive data
- VALUES clause: Provides the actual data in the same order as columns
- Data types: Each value must match the expected data type for its column
Insert with All Columns:
INSERT INTO products
VALUES (1001, 'Wireless Headphones', 'Electronics', 149.99, 25, '2024-03-15', TRUE);
When providing values for all columns in table order, the column list can be omitted, but this practice is discouraged because it makes code fragile to table structure changes.
Multiple Row Insertions
Purpose: Efficiently add multiple records in a single statement.
Batch Insert Example:
INSERT INTO employees (first_name, last_name, department, hire_date, salary)
VALUES
('Alice', 'Johnson', 'Engineering', '2024-01-15', 85000),
('Bob', 'Williams', 'Marketing', '2024-01-20', 65000),
('Carol', 'Davis', 'Sales', '2024-02-01', 55000),
('David', 'Miller', 'Engineering', '2024-02-10', 90000);
Benefits of Batch Inserts:
- Performance: Faster than multiple single-row inserts
- Transaction Efficiency: All rows inserted in one transaction
- Reduced Network Overhead: Single round-trip to database
- Atomicity: Either all rows insert successfully or none do
INSERT with SELECT (Data Migration)
Purpose: Copy data from existing tables or transform data during insertion.
Copying Customer Data to Archive:
INSERT INTO customers_archive (customer_id, customer_name, email, total_purchases, archive_date)
SELECT
customer_id,
customer_name,
email,
total_purchases,
CURRENT_DATE
FROM customers
WHERE last_order_date < '2022-01-01';
Creating Summary Records:
INSERT INTO monthly_sales_summary (year_month, total_orders, total_revenue, avg_order_value)
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS year_month,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_revenue,
AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m');
Handling Default Values and Auto-Increment
Working with Auto-Increment Columns:
-- Let database generate the ID automatically
INSERT INTO orders (customer_id, order_date, total_amount, status)
VALUES (1023, '2024-03-15', 299.99, 'Pending');
Using Default Values:
-- Explicit default value usage
INSERT INTO products (product_name, category, price, in_stock, created_date)
VALUES ('New Product', 'Electronics', 199.99, DEFAULT, DEFAULT);
-- Columns with defaults can be omitted
INSERT INTO products (product_name, category, price)
VALUES ('Another Product', 'Books', 29.99);
INSERT with Duplicate Handling
ON DUPLICATE KEY UPDATE (MySQL):
INSERT INTO product_inventory (product_id, quantity_on_hand, last_updated)
VALUES (1001, 50, NOW())
ON DUPLICATE KEY UPDATE
quantity_on_hand = quantity_on_hand + VALUES(quantity_on_hand),
last_updated = NOW();
INSERT IGNORE (MySQL):
-- Skip rows that would cause duplicate key errors
INSERT IGNORE INTO customers (customer_id, customer_name, email)
VALUES
(1, 'John Doe', 'john@email.com'),
(2, 'Jane Smith', 'jane@email.com');
UPDATE Statement
The UPDATE statement modifies existing data in database tables. It’s like editing information in existing files – you can change specific pieces of information while keeping the rest intact.
Basic UPDATE Operations
Purpose: Modify existing records by changing values in specified columns.
Simple Column Update:
UPDATE customers
SET phone = '555-0199'
WHERE customer_id = 1001;
Understanding This Update:
- Target table: customers table contains the records to modify
- SET clause: Specifies which columns to change and their new values
- WHERE clause: Identifies which specific rows to update (crucial for preventing unintended changes)
Multiple Column Update:
UPDATE employees
SET salary = 92000,
department = 'Senior Engineering',
last_promotion_date = '2024-03-15'
WHERE employee_id = 2025;
Conditional Updates
Using CASE Statements for Complex Logic:
UPDATE products
SET discount_percentage =
CASE
WHEN category = 'Electronics' THEN 15
WHEN category = 'Clothing' THEN 25
WHEN category = 'Books' THEN 10
ELSE 5
END,
sale_end_date = '2024-04-30'
WHERE in_stock = TRUE;
Calculated Updates:
-- Apply percentage-based salary increases
UPDATE employees
SET salary = salary * 1.05, -- 5% increase
last_review_date = CURRENT_DATE
WHERE performance_rating >= 4.0
AND last_promotion_date < '2023-01-01';
Updates Based on Other Tables (JOIN Updates)
Purpose: Update records based on information from related tables.
Updating Customer Status Based on Order History:
UPDATE customers c
INNER JOIN (
SELECT customer_id,
SUM(total_amount) as lifetime_value,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY customer_id
) order_stats ON c.customer_id = order_stats.customer_id
SET c.customer_tier =
CASE
WHEN order_stats.lifetime_value > 10000 THEN 'Platinum'
WHEN order_stats.lifetime_value > 5000 THEN 'Gold'
WHEN order_stats.lifetime_value > 1000 THEN 'Silver'
ELSE 'Bronze'
END,
c.total_orders = order_stats.order_count;
Inventory Updates Based on Sales:
UPDATE products p
INNER JOIN (
SELECT product_id, SUM(quantity) as total_sold
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date = CURRENT_DATE
GROUP BY product_id
) daily_sales ON p.product_id = daily_sales.product_id
SET p.stock_quantity = p.stock_quantity - daily_sales.total_sold,
p.last_sale_date = CURRENT_DATE;
Bulk Updates and Data Maintenance
Batch Status Updates:
-- Mark old orders as archived
UPDATE orders
SET status = 'Archived',
archived_date = CURRENT_DATE
WHERE order_date < '2023-01-01'
AND status = 'Delivered';
Data Cleanup Operations:
-- Standardize phone number formatting
UPDATE customers
SET phone = CONCAT(
'(',
SUBSTRING(REGEXP_REPLACE(phone, '[^0-9]', ''), 1, 3),
') ',
SUBSTRING(REGEXP_REPLACE(phone, '[^0-9]', ''), 4, 3),
'-',
SUBSTRING(REGEXP_REPLACE(phone, '[^0-9]', ''), 7, 4)
)
WHERE phone IS NOT NULL
AND LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '')) = 10;
DELETE Statement
The DELETE statement removes existing rows from database tables. It’s like removing specific files from a filing system – once deleted, the information is gone (unless you have backups).
Basic DELETE Operations
Purpose: Remove unwanted or outdated records from database tables.
Simple Row Deletion:
DELETE FROM customers
WHERE customer_id = 1001;
Understanding This Delete:
- Target table: customers table contains records to be removed
- WHERE clause: Absolutely critical – specifies which rows to delete
- No column specification: DELETE removes entire rows, not individual column values
Conditional Deletion:
-- Remove inactive accounts
DELETE FROM user_accounts
WHERE last_login_date < '2022-01-01'
AND account_status = 'Inactive';
DELETE with Joins (Complex Conditions)
Purpose: Remove records based on conditions involving multiple tables.
Removing Orphaned Records:
-- Delete order items for cancelled orders
DELETE oi
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE o.status = 'Cancelled'
AND o.order_date < '2024-01-01';
Cleanup Based on Related Data:
-- Remove customer preferences for customers who haven't placed orders
DELETE cp
FROM customer_preferences cp
LEFT JOIN orders o ON cp.customer_id = o.customer_id
WHERE o.customer_id IS NULL;
Cascading Deletes and Referential Integrity
Understanding Foreign Key Constraints: When tables have foreign key relationships, deletions must respect referential integrity.
Parent-Child Relationship Example:
-- This might fail if there are related order_items
DELETE FROM orders WHERE customer_id = 1001;
-- Safe approach: delete children first
DELETE FROM order_items WHERE order_id IN
(SELECT order_id FROM orders WHERE customer_id = 1001);
DELETE FROM orders WHERE customer_id = 1001;
CASCADE DELETE Setup (DDL for context):
-- Foreign key with cascade delete
ALTER TABLE order_items
ADD CONSTRAINT fk_order_cascade
FOREIGN KEY (order_id) REFERENCES orders(order_id)
ON DELETE CASCADE;
Bulk Deletion and Archival
Large-Scale Data Removal:
-- Remove old log entries (be careful with large tables)
DELETE FROM system_logs
WHERE log_date < DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY);
-- Delete in smaller batches for large tables
DELETE FROM large_transaction_table
WHERE transaction_date < '2023-01-01'
LIMIT 1000;
Archive Before Delete Pattern:
-- Step 1: Archive old data
INSERT INTO orders_archive
SELECT * FROM orders
WHERE order_date < '2023-01-01';
-- Step 2: Delete archived data
DELETE FROM orders
WHERE order_date < '2023-01-01';
DELETE vs TRUNCATE vs DROP
Understanding the Differences:
DELETE:
- Removes specific rows based on conditions
- Can be rolled back in transactions
- Fires triggers
- Slower for large datasets
- Maintains table structure and relationships
TRUNCATE:
- Removes all rows quickly
- Cannot use WHERE clause
- Resets auto-increment counters
- Faster than DELETE for entire tables
- May not be rollbackable
DROP:
- Removes entire table structure and data
- Cannot be undone without backups
- Breaks relationships with other tables
- DDL operation, not DML
DML Best Practices and Performance Considerations
Transaction Management
Understanding Transactions in DML:
-- Grouping related operations
START TRANSACTION;
UPDATE accounts
SET balance = balance - 1000
WHERE account_id = 'ACC001';
UPDATE accounts
SET balance = balance + 1000
WHERE account_id = 'ACC002';
INSERT INTO transaction_log (from_account, to_account, amount, transaction_date)
VALUES ('ACC001', 'ACC002', 1000, NOW());
COMMIT;
Rollback on Error:
START TRANSACTION;
UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 1001;
-- Check if update was successful
IF (SELECT quantity FROM inventory WHERE product_id = 1001) < 0 THEN
ROLLBACK;
ELSE
COMMIT;
END IF;
Performance Optimization
Efficient WHERE Clauses:
- Use indexed columns in WHERE conditions
- Avoid functions on columns in WHERE clauses
- Use specific conditions rather than broad ranges when possible
Batch Processing:
-- Instead of many single-row operations
-- Process in batches for better performance
UPDATE products
SET last_updated = NOW()
WHERE category = 'Electronics'
AND product_id BETWEEN 1000 AND 2000;
Data Integrity and Safety
Always Use WHERE in UPDATE/DELETE:
-- Dangerous: Updates all rows
UPDATE products SET price = 0;
-- Safe: Updates specific rows
UPDATE products SET price = 0 WHERE discontinued = TRUE;
Validate Before Bulk Operations:
-- Check what will be affected before deleting
SELECT COUNT(*) FROM old_records WHERE last_access < '2022-01-01';
-- Then perform the actual delete
DELETE FROM old_records WHERE last_access < '2022-01-01';
Monitoring and Auditing
Tracking Data Changes:
-- Update with audit trail
UPDATE customer_profiles
SET email = 'new.email@example.com',
last_modified = NOW(),
modified_by = 'admin_user'
WHERE customer_id = 1001;
Understanding DML operations is fundamental to working effectively with databases. These commands form the core of most database interactions and require careful consideration of performance, data integrity, and business logic to implement successfully.