阿里云主机折上折
  • 微信号
Current Site:Index > Index selection and sorting optimization

Index selection and sorting optimization

Author:Chuan Chen 阅读数:51620人阅读 分类: MongoDB

Basic Principles of Index Selection

The core of MongoDB index selection lies in understanding query patterns and data distribution. B-tree indexes are most suitable for equality queries and range queries, while hash indexes are optimized specifically for equality queries. The order of compound indexes follows the ESR principle: Equality match fields come first, Sort fields are placed in the middle, and Range query fields are placed last.

// Poor index design
db.orders.createIndex({ status: 1, order_date: -1 })

// Optimized index design (following the ESR principle)
db.orders.createIndex({ user_id: 1, order_date: -1, status: 1 })

When queries involve multiple filtering conditions, fields with high selectivity should be prioritized for indexing. Selectivity refers to the ratio of distinct values in a field to the total number of documents. For example, user IDs typically have higher selectivity than gender fields.

Performance Impact of Sorting Operations

Sorting operations in MongoDB can lead to memory-intensive workloads, especially when indexes cannot be used for sorting. The $sort stage has a default memory limit of 100MB, beyond which it writes to temporary files. Creating appropriate indexes can prevent such memory overflow issues.

// Sorting requiring a full collection scan
db.products.find().sort({ price: 1 })

// Optimization using a covered index
db.products.createIndex({ category: 1, price: 1 })
db.products.find({ category: "electronics" }).sort({ price: 1 })

For sharded collections, sorting operations are more complex. If the sort key is not the shard key, mongos must merge results from all shards in memory. In such cases, using the $mergeStage to write results to a temporary collection may be more efficient.

Design Strategies for Compound Indexes

Designing compound indexes requires balancing query coverage and write performance. The order of index fields should match the query condition order but also consider sorting requirements. Index intersection can sometimes replace large compound indexes, but compound indexes are generally more efficient.

// Compound index for multi-dimensional queries
db.sales.createIndex({
  region: 1,
  category: 1,
  sale_date: -1,
  amount: 1
})

// This index can support the following query:
db.sales.find({
  region: "North",
  category: "Furniture",
  sale_date: { $gte: ISODate("2023-01-01") }
}).sort({ amount: 1 })

Partial indexes can significantly reduce index storage space and maintenance overhead, especially for querying specific subsets. For example, creating an index only for active users:

db.users.createIndex(
  { last_login: -1 },
  { partialFilterExpression: { status: "active" } }
)

Index Maintenance and Monitoring

Regularly analyzing index usage is crucial. The $indexStats operation shows index usage frequency, while the explain() method reveals query execution plans. Removing unused indexes can reduce write overhead and storage usage.

// View index usage statistics
db.collection.aggregate([ { $indexStats: {} } ])

// Analyze query execution plan
db.orders.find({
  user_id: "12345",
  order_date: { $gt: ISODate("2023-01-01") }
}).explain("executionStats")

The index build process can impact production environments. For large collections, consider building indexes during off-peak hours or using the background build option. MongoDB 4.2+ supports interruptible index builds, allowing operations to be terminated during construction.

Application Scenarios for Special Index Types

Geospatial indexes (2dsphere) support location data queries, text indexes enable full-text search, and wildcard indexes are suitable for documents with flexible schemas. Each special index type has specific optimization considerations.

// Geospatial index example
db.places.createIndex({ location: "2dsphere" })
db.places.find({
  location: {
    $near: {
      $geometry: {
        type: "Point",
        coordinates: [ -73.9667, 40.78 ]
      },
      $maxDistance: 1000
    }
  }
})

// Wildcard index example
db.products.createIndex({ "specs.$**": 1 })

TTL indexes automatically expire data, making them ideal for temporary data like logs and sessions. Note that TTL cleanup tasks run every minute and do not guarantee precise expiration times.

// Create a log index that expires after 24 hours
db.log_events.createIndex(
  { created_at: 1 },
  { expireAfterSeconds: 86400 }
)

Query Patterns and Index Matching

Understanding query selectivity is critical for index efficiency. Low-selectivity queries (e.g., status="active") may not warrant standalone indexes. In such cases, they can be included as suffix fields in compound indexes.

// Handling low-selectivity fields in indexes
db.articles.createIndex({
  category: 1,
  tags: 1,
  status: 1  // Low-selectivity field placed last
})

Covered queries can be satisfied entirely by indexes without examining the documents themselves. This requires the index to include all query and return fields.

// Creating an index for covered queries
db.orders.createIndex({
  order_id: 1,
  customer_id: 1,
  order_date: 1,
  total_amount: 1
})

// Example of a covered query
db.orders.find(
  { customer_id: "cust123", order_date: { $gt: ISODate("2023-01-01") } },
  { order_id: 1, order_date: 1, total_amount: 1, _id: 0 }
)

Index Strategies in Sharded Environments

In a sharded cluster, each shard maintains its own indexes. Choosing an appropriate shard key is critical, as it determines data distribution and query routing. A good shard key should have sufficient cardinality, avoid hotspots, and align with common query patterns.

// Poor shard key choice (low cardinality)
db.customers.createIndex({ country: 1 })  // Only a limited number of country values

// Better shard key choice
db.customers.createIndex({ customer_id: "hashed" })  // High cardinality and even distribution

When queries include the shard key, mongos can route them to specific shards (targeted queries). Otherwise, queries must be broadcast to all shards (scatter-gather queries). Compound shard keys can balance distribution and locality.

Trade-offs Between Indexes and Write Performance

Each additional index increases write operation overhead, as every document change requires updating all related indexes. For write-intensive collections, indexes should be added cautiously.

// Assessing the impact of indexes on writes
db.runCommand({
  serverStatus: 1,
  metrics: {
    "operation": {
      "writeConcern": {
        "getLastError": 1,
        "w": 1,
        "wtimeout": 0
      }
    }
  }
})

Index size also affects memory usage. The working set (frequently accessed data and indexes) should fit in memory for optimal performance. The collStats command can monitor index size.

// View collection and index sizes
db.collection.stats()

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.