Database Indexes - UNIQUE, CLUSTERED, NONCLUSTERED

4 minute read

Types of Indexes

1. Single-Column Index

An index created on a single column. It is the simplest form of indexing.

Advantages:

Speeds up searches and sorting based on the indexed column.
Reduces query execution time for queries with a WHERE clause.

Limitations:

Doesn’t improve performance for queries involving multiple columns.

SQL Example:

CREATE INDEX idx_user_id ON users(user_id);

2. Composite Index

An index on two or more columns, primarily used for queries that filter or sort by multiple columns.

Advantages:

Useful for queries with multiple columns in the WHERE clause.
Enhances performance for queries that sort or filter by the indexed columns.

Limitations:

The order of columns in the index matters. For example, an index on (A, B) will not - efficiently handle a query filtering only on B.

SQL Example:

CREATE INDEX idx_city_age ON users(city, age);

3. Unique Index

A unique index enforces that all values in the indexed column(s) must be unique.

Advantages:

Guarantees data integrity by preventing duplicate entries.
Often used for columns like email, username, etc.

Limitations:

Slightly slower inserts/updates due to uniqueness checks.

SQL Example:

CREATE UNIQUE INDEX idx_unique_email ON users(email);

4. Full-Text Index

A special type of index for searching text within columns.

Advantages:

Optimized for text search.
Ideal for applications like search engines or document retrieval systems.

Limitations:

High storage overhead.
Requires database-specific support (e.g., FTS in SQLite or FULLTEXT in MySQL).

Example:

CREATE FULLTEXT INDEX idx_fulltext_content ON articles(title, content);

5. Clustered Index

In a clustered index, the rows of the table are physically sorted to match the index. The table itself becomes the index. Each table can have only one clustered index.

Advantages:

Faster for Range Queries: Queries like BETWEEN, ORDER BY, and GROUP BY are highly efficient because data is stored sequentially.
Efficient Retrieval of Full Rows: Since the data is the index, fetching the entire row is faster.

Limitations:

Only One Per Table: Since the table is sorted physically, only one clustered index can exist.
Slower Inserts/Updates: Maintaining physical order means higher overhead for data modifications.

When to Use Clustered Index

Tables with many range-based queries.
Primary key columns or frequently sorted columns.
Tables that are more read-heavy than write-heavy.

SQL Example (MySQL): Suppose you create a clustered index on a table’s primary key:

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT
);

Here, the primary key emp_id is a clustered index by default. The table’s rows are stored in ascending order of emp_id.

Querying SELECT * FROM employees WHERE emp_id BETWEEN 10 AND 20; is fast because the data is physically ordered.

6. Non-Clustered Index

In a non-clustered index, a separate structure is created to store index data and pointers to the actual rows in the table. The table’s data is not physically sorted.

Advantages:

Multiple Indexes Allowed: You can create multiple non-clustered indexes on a table, optimizing multiple query patterns.
Faster for Narrow Queries: Ideal for queries that only need to access a few columns.

Limitations:

Additional Storage Required: Indexes require extra space to store the index structure.
Extra Lookups: Requires pointer chasing to fetch actual rows from the table, making it slower for queries needing the full row.

When to Use Non-Clustered Index

Columns frequently used in WHERE, JOIN, or ORDER BY clauses but not the primary key.
Tables where multiple query patterns exist.
Write-heavy tables where physical reordering would be costly.

SQL Example: Suppose you add a non-clustered index on the name column:

CREATE NONCLUSTERED INDEX idx_name ON employees(name);

This creates a separate structure mapping each name to the corresponding emp_id.

Querying SELECT * FROM employees WHERE name = 'Alice'; uses the non-clustered index to quickly locate the row.

7. Bitmap Index

Uses a bitmap for each unique value of the indexed column. Ideal for low-cardinality columns (columns with few unique values).

Advantages:

Efficient for low-cardinality columns (e.g., gender, status).
Combines bitmaps for multiple conditions.

Limitations:

High overhead for updates.
Mostly used in read-heavy workloads.

8. Hash Index

Uses a hash table to store index data, making equality lookups (=) extremely fast.

Advantages:

Optimized for exact-match queries.
Simple and fast.

Limitations:

Not suitable for range queries (>, <, BETWEEN).

SQL Example (PostgreSQL):

CREATE INDEX idx_hash_user_id ON users USING HASH(user_id);

Optimizing Queries in Python-Based Applications

Use Indexes Appropriately:
- Ensure indexes exist on columns frequently used in WHERE, JOIN, ORDER BY, or GROUP BY clauses.
- Avoid indexing columns with high write/update frequency unnecessarily.
Avoid Full Table Scans:
- Structure queries to leverage indexed columns.
- Use specific column names in SELECT instead of SELECT *.

Database Connection Pooling:

Use libraries like SQLAlchemy or psycopg2 with connection pooling to reduce overhead.
Example:

 from psycopg2.pool import SimpleConnectionPool

 pool = SimpleConnectionPool(1, 10, database="test", user="postgres", password="secret")
 conn = pool.getconn()
 cursor = conn.cursor()
 cursor.execute("SELECT * FROM users WHERE age > 25")
 pool.putconn(conn)

Batch Queries:

Process bulk data in batches instead of executing multiple queries sequentially.
Example:

 from psycopg2.extras import execute_batch

 data = [(1, 'Alice', 25), (2, 'Bob', 30)]
 query = "INSERT INTO users (user_id, name, age) VALUES (%s, %s, %s)"
 execute_batch(cursor, query, data)

Caching:

Use caching mechanisms like Redis for frequently accessed data.
Cache query results in memory where applicable.
Example:

 import redis

 cache = redis.StrictRedis(host="localhost", port=6379, decode_responses=True)
 cache.set("user_1", '{"user_id": 1, "name": "Alice"}')

Use Query Optimizers:
- Analyze query plans using EXPLAIN or EXPLAIN ANALYZE to identify bottlenecks.
- Example:
```
 EXPLAIN ANALYZE SELECT * FROM users WHERE age > 25;
```

Share on

Twitter Facebook Google+ LinkedIn

Manju S V

Database Indexes - UNIQUE, CLUSTERED, NONCLUSTERED

Types of Indexes

1. Single-Column Index

2. Composite Index

3. Unique Index

4. Full-Text Index

5. Clustered Index

6. Non-Clustered Index

7. Bitmap Index

8. Hash Index

Optimizing Queries in Python-Based Applications

Share on

Leave a comment

You may also enjoy

Parsed, Analyzed, and Optimized Logical Plans in Spark

Cache vs Persist in Spark

Managed vs external table in Spark

Repartition vs Coalesce in Spark