阿里云主机折上折
  • 微信号
Current Site:Index > Hash index and TTL index

Hash index and TTL index

Author:Chuan Chen 阅读数:43895人阅读 分类: MongoDB

Hash Index

A hash index is a special type of single-field index in MongoDB that uses a hash function to compute the hash value of the indexed field and stores the hash result in the index. Unlike regular B-tree indexes, hash indexes do not support range queries but can provide faster performance for equality queries.

Key features of hash indexes include:

  1. Supports only exact match queries ($eq), not range queries ($gt, $lt, etc.)
  2. Does not support multikey indexes (array fields)
  3. Hash values are evenly distributed, reducing hotspot issues
  4. Fixed index size since all hash values have the same length

Syntax for creating a hash index:

db.collection.createIndex({ field: "hashed" })

Example: Creating a hash index for the username field in a users collection:

db.users.createIndex({ username: "hashed" })

Hash indexes are particularly suitable for:

  • Fields with frequent equality queries and uneven data distribution
  • Scenarios requiring write load distribution
  • Fields where range queries are unnecessary

Note that hash indexes do not support compound indexes and cannot be combined with other index types. When querying with a hash index, the query planner automatically selects the hash index for exact match queries.

TTL Index

TTL (Time To Live) indexes are a special type of index in MongoDB that allow automatic deletion of expired documents from a collection. These indexes are particularly useful for managing temporary data (e.g., session information, logs, caches) by automatically cleaning up expired data, reducing manual maintenance efforts.

Key features of TTL indexes:

  1. Based on date-type fields
  2. Background thread for automatic deletion runs every minute by default
  3. Can specify document lifetime in seconds
  4. Can only be created on a single field

Basic syntax for creating a TTL index:

db.collection.createIndex({ field: 1 }, { expireAfterSeconds: 3600 })

Example 1: Creating a log index that expires after 24 hours:

db.logs.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 })

Example 2: Creating a session index that expires after 1 hour:

db.sessions.createIndex({ lastAccessed: 1 }, { expireAfterSeconds: 3600 })

How TTL indexes work:

  1. MongoDB periodically checks TTL indexes (default every 60 seconds)
  2. For documents where the indexed field is a date, MongoDB compares it with the current time
  3. If the current time exceeds the field value plus expireAfterSeconds, the document is deleted

Important considerations:

  • Documents with non-date fields or non-date values will not be automatically deleted
  • Documents without the indexed field will not be automatically deleted
  • Deletion operations are asynchronous and may be delayed
  • In replica sets, deletions occur only on the primary node and are synced to secondaries via oplog

Advanced usage: Dynamic TTL Different expiration times can be achieved by storing them in documents:

// Document contains an expireAt field specifying exact expiration time
db.events.insertOne({
  message: "System maintenance",
  expireAt: new Date("2023-12-31T23:59:59Z")
})

// Creating a TTL index with expireAfterSeconds set to 0 uses the exact date in the field
db.events.createIndex({ expireAt: 1 }, { expireAfterSeconds: 0 })

Comparison of Hash Indexes and TTL Indexes

Although both hash indexes and TTL indexes are special index types in MongoDB, they address completely different problems:

Feature Hash Index TTL Index
Primary purpose Optimizes equality query performance Automatically deletes expired documents
Index field type Any type Must be a date type
Query support Only exact matches Does not affect query capability
Automatic maintenance None Automatically deletes expired documents
Compound indexes Not supported Not supported
Performance impact Hash computation during writes Periodic background scans

These indexes can be combined in practice. For example, in a message queue system:

// Create hash index on message ID for faster lookups
db.messages.createIndex({ messageId: "hashed" })

// Create TTL index on expiration time for automatic cleanup
db.messages.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })

Performance Considerations and Best Practices

Hash index performance considerations:

  1. Hash collisions: Although rare, different values may theoretically produce the same hash
  2. Memory usage: Hash indexes typically consume more memory than B-tree indexes
  3. Write overhead: Hash computation slightly reduces write performance compared to regular indexes

TTL index performance considerations:

  1. Deletion operations increase system load, especially when many documents expire simultaneously
  2. Frequent small-batch deletions impact the system less than single large-batch deletions
  3. Expiration times should avoid business peak hours to prevent mass deletions during high traffic

Best practices:

  1. Hash indexes are suitable for high-cardinality fields with exact match query patterns
  2. TTL index expiration times should be carefully set based on business needs
  3. For TTL indexes, prefer specific expiration time fields (expireAt pattern) over fixed lifetimes
  4. Monitor TTL index deletion performance, especially with large datasets

Example: Monitoring TTL index performance

// View TTL index deletion statistics
db.serverStatus().metrics.ttl

// Sample output might include:
{
  "deletedDocuments" : NumberLong(1250),
  "passes" : NumberLong(42)
}

Practical Application Examples

Case 1: User session management system

// Create session collection
db.sessions.createIndex({ sessionToken: "hashed" })  // Fast session lookup
db.sessions.createIndex({ lastActivity: 1 }, { expireAfterSeconds: 1800 })  // Expires after 30 minutes of inactivity

// Update session activity time
db.sessions.updateOne(
  { sessionToken: "abc123" },
  { $set: { lastActivity: new Date() } }
)

Case 2: Temporary verification code storage

// Create verification code collection
db.verificationCodes.createIndex({ phone: "hashed" })  // Fast phone number lookup
db.verificationCodes.createIndex({ createdAt: 1 }, { expireAfterSeconds: 300 })  // Expires after 5 minutes

// Insert verification code
db.verificationCodes.insertOne({
  phone: "+1234567890",
  code: "123456",
  createdAt: new Date()
})

Case 3: Cache system implementation

// Create cache collection
db.cache.createIndex({ key: "hashed" })  // Fast cache lookups
db.cache.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })  // Dynamic expiration

// Set cache value
function setCache(key, value, ttlSeconds) {
  db.cache.updateOne(
    { key: key },
    { 
      $set: { 
        value: value,
        expiresAt: new Date(Date.now() + ttlSeconds * 1000)
      }
    },
    { upsert: true }
  )
}

// Get cache value
function getCache(key) {
  const doc = db.cache.findOne({ key: key })
  return doc ? doc.value : null
}

Advanced Topic: Internal Mechanics of TTL Indexes

MongoDB's TTL indexes are essentially special single-field indexes with internal implementation relying on these key components:

  1. Background thread: MongoDB has a dedicated TTL thread that wakes up every minute by default
  2. Index scanning: The thread scans TTL indexes to identify documents eligible for deletion
  3. Batch deletion: Executes deletions in batches to avoid prolonged resource usage
  4. Load control: If too many documents expire, deletions are split across multiple passes to prevent system overload

The TTL thread interval can be adjusted (requires mongod restart):

// Set TTL monitor run interval (seconds) in configuration file
storage:
  ttlMonitorSleepSecs: 30

For very large collections, the TTL deletion batch size may need adjustment:

// Set the number of documents deleted per TTL pass in configuration file
storage:
  ttlMonitorBatchSize: 500

Limitations and Considerations

Hash index limitations:

  1. Cannot be used as shard keys (MongoDB 4.4 and earlier)
  2. Does not support multikey indexes (array fields)
  3. Cannot be used in compound indexes
  4. Does not support sorting operations

TTL index limitations:

  1. Cannot be used with capped collections
  2. Cannot be used on the _id field
  3. In sharded clusters, TTL indexes require the shard key to include the indexed field
  4. Deletions are not real-time and may be delayed

Special scenario handling:

  1. For sharded collections, TTL index fields must be included in the shard key or be a prefix of it
  2. During failover, TTL deletions may be delayed
  3. When system load is high, TTL deletions may be postponed

Example: TTL index on a sharded collection

// First enable sharding
sh.enableSharding("test")

// Create shard key (must include TTL field or be its prefix)
sh.shardCollection("test.events", { timestamp: 1, _id: 1 })

// Then create the TTL index
db.events.createIndex({ timestamp: 1 }, { expireAfterSeconds: 3600 })

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.