Hash index and TTL index
Hash Index
A hash index is a special type of single-field index in MongoDB that uses a hash function to compute the hash value of the indexed field and stores the hash result in the index. Unlike regular B-tree indexes, hash indexes do not support range queries but can provide faster performance for equality queries.
Key features of hash indexes include:
- Supports only exact match queries (
$eq
), not range queries ($gt
,$lt
, etc.) - Does not support multikey indexes (array fields)
- Hash values are evenly distributed, reducing hotspot issues
- Fixed index size since all hash values have the same length
Syntax for creating a hash index:
db.collection.createIndex({ field: "hashed" })
Example: Creating a hash index for the username
field in a users collection:
db.users.createIndex({ username: "hashed" })
Hash indexes are particularly suitable for:
- Fields with frequent equality queries and uneven data distribution
- Scenarios requiring write load distribution
- Fields where range queries are unnecessary
Note that hash indexes do not support compound indexes and cannot be combined with other index types. When querying with a hash index, the query planner automatically selects the hash index for exact match queries.
TTL Index
TTL (Time To Live) indexes are a special type of index in MongoDB that allow automatic deletion of expired documents from a collection. These indexes are particularly useful for managing temporary data (e.g., session information, logs, caches) by automatically cleaning up expired data, reducing manual maintenance efforts.
Key features of TTL indexes:
- Based on date-type fields
- Background thread for automatic deletion runs every minute by default
- Can specify document lifetime in seconds
- Can only be created on a single field
Basic syntax for creating a TTL index:
db.collection.createIndex({ field: 1 }, { expireAfterSeconds: 3600 })
Example 1: Creating a log index that expires after 24 hours:
db.logs.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 })
Example 2: Creating a session index that expires after 1 hour:
db.sessions.createIndex({ lastAccessed: 1 }, { expireAfterSeconds: 3600 })
How TTL indexes work:
- MongoDB periodically checks TTL indexes (default every 60 seconds)
- For documents where the indexed field is a date, MongoDB compares it with the current time
- If the current time exceeds the field value plus
expireAfterSeconds
, the document is deleted
Important considerations:
- Documents with non-date fields or non-date values will not be automatically deleted
- Documents without the indexed field will not be automatically deleted
- Deletion operations are asynchronous and may be delayed
- In replica sets, deletions occur only on the primary node and are synced to secondaries via oplog
Advanced usage: Dynamic TTL Different expiration times can be achieved by storing them in documents:
// Document contains an expireAt field specifying exact expiration time
db.events.insertOne({
message: "System maintenance",
expireAt: new Date("2023-12-31T23:59:59Z")
})
// Creating a TTL index with expireAfterSeconds set to 0 uses the exact date in the field
db.events.createIndex({ expireAt: 1 }, { expireAfterSeconds: 0 })
Comparison of Hash Indexes and TTL Indexes
Although both hash indexes and TTL indexes are special index types in MongoDB, they address completely different problems:
Feature | Hash Index | TTL Index |
---|---|---|
Primary purpose | Optimizes equality query performance | Automatically deletes expired documents |
Index field type | Any type | Must be a date type |
Query support | Only exact matches | Does not affect query capability |
Automatic maintenance | None | Automatically deletes expired documents |
Compound indexes | Not supported | Not supported |
Performance impact | Hash computation during writes | Periodic background scans |
These indexes can be combined in practice. For example, in a message queue system:
// Create hash index on message ID for faster lookups
db.messages.createIndex({ messageId: "hashed" })
// Create TTL index on expiration time for automatic cleanup
db.messages.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })
Performance Considerations and Best Practices
Hash index performance considerations:
- Hash collisions: Although rare, different values may theoretically produce the same hash
- Memory usage: Hash indexes typically consume more memory than B-tree indexes
- Write overhead: Hash computation slightly reduces write performance compared to regular indexes
TTL index performance considerations:
- Deletion operations increase system load, especially when many documents expire simultaneously
- Frequent small-batch deletions impact the system less than single large-batch deletions
- Expiration times should avoid business peak hours to prevent mass deletions during high traffic
Best practices:
- Hash indexes are suitable for high-cardinality fields with exact match query patterns
- TTL index expiration times should be carefully set based on business needs
- For TTL indexes, prefer specific expiration time fields (expireAt pattern) over fixed lifetimes
- Monitor TTL index deletion performance, especially with large datasets
Example: Monitoring TTL index performance
// View TTL index deletion statistics
db.serverStatus().metrics.ttl
// Sample output might include:
{
"deletedDocuments" : NumberLong(1250),
"passes" : NumberLong(42)
}
Practical Application Examples
Case 1: User session management system
// Create session collection
db.sessions.createIndex({ sessionToken: "hashed" }) // Fast session lookup
db.sessions.createIndex({ lastActivity: 1 }, { expireAfterSeconds: 1800 }) // Expires after 30 minutes of inactivity
// Update session activity time
db.sessions.updateOne(
{ sessionToken: "abc123" },
{ $set: { lastActivity: new Date() } }
)
Case 2: Temporary verification code storage
// Create verification code collection
db.verificationCodes.createIndex({ phone: "hashed" }) // Fast phone number lookup
db.verificationCodes.createIndex({ createdAt: 1 }, { expireAfterSeconds: 300 }) // Expires after 5 minutes
// Insert verification code
db.verificationCodes.insertOne({
phone: "+1234567890",
code: "123456",
createdAt: new Date()
})
Case 3: Cache system implementation
// Create cache collection
db.cache.createIndex({ key: "hashed" }) // Fast cache lookups
db.cache.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 }) // Dynamic expiration
// Set cache value
function setCache(key, value, ttlSeconds) {
db.cache.updateOne(
{ key: key },
{
$set: {
value: value,
expiresAt: new Date(Date.now() + ttlSeconds * 1000)
}
},
{ upsert: true }
)
}
// Get cache value
function getCache(key) {
const doc = db.cache.findOne({ key: key })
return doc ? doc.value : null
}
Advanced Topic: Internal Mechanics of TTL Indexes
MongoDB's TTL indexes are essentially special single-field indexes with internal implementation relying on these key components:
- Background thread: MongoDB has a dedicated TTL thread that wakes up every minute by default
- Index scanning: The thread scans TTL indexes to identify documents eligible for deletion
- Batch deletion: Executes deletions in batches to avoid prolonged resource usage
- Load control: If too many documents expire, deletions are split across multiple passes to prevent system overload
The TTL thread interval can be adjusted (requires mongod restart):
// Set TTL monitor run interval (seconds) in configuration file
storage:
ttlMonitorSleepSecs: 30
For very large collections, the TTL deletion batch size may need adjustment:
// Set the number of documents deleted per TTL pass in configuration file
storage:
ttlMonitorBatchSize: 500
Limitations and Considerations
Hash index limitations:
- Cannot be used as shard keys (MongoDB 4.4 and earlier)
- Does not support multikey indexes (array fields)
- Cannot be used in compound indexes
- Does not support sorting operations
TTL index limitations:
- Cannot be used with capped collections
- Cannot be used on the
_id
field - In sharded clusters, TTL indexes require the shard key to include the indexed field
- Deletions are not real-time and may be delayed
Special scenario handling:
- For sharded collections, TTL index fields must be included in the shard key or be a prefix of it
- During failover, TTL deletions may be delayed
- When system load is high, TTL deletions may be postponed
Example: TTL index on a sharded collection
// First enable sharding
sh.enableSharding("test")
// Create shard key (must include TTL field or be its prefix)
sh.shardCollection("test.events", { timestamp: 1, _id: 1 })
// Then create the TTL index
db.events.createIndex({ timestamp: 1 }, { expireAfterSeconds: 3600 })
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn