阿里云主机折上折
  • 微信号
Current Site:Index > The role and principle of indexes

The role and principle of indexes

Author:Chuan Chen 阅读数:36834人阅读 分类: MongoDB

The Role of Indexes

Indexes are data structures used in databases to speed up query operations. In MongoDB, the role of indexes is primarily reflected in three aspects: improving query performance, ensuring data uniqueness, and optimizing sorting operations. When a collection contains a large amount of data, queries without indexes require a full table scan, which is highly inefficient. For example, querying for documents with specific conditions in a collection of millions of documents may require scanning all documents, whereas with an index, only a few dozen documents might need to be checked.

Indexes can also enforce field uniqueness. After creating a unique index, MongoDB will reject insert or update operations that result in duplicate values. This is particularly useful for scenarios requiring uniqueness, such as user emails or phone numbers. Additionally, when a query includes sorting operations, an appropriate index can avoid in-memory sorting and directly return results in the order of the index.

// Create a regular index
db.users.createIndex({ username: 1 });

// Create a unique index
db.users.createIndex({ email: 1 }, { unique: true });

// Compound index
db.orders.createIndex({ customerId: 1, orderDate: -1 });

How Indexes Work

MongoDB indexes typically use a B-tree data structure (actually a variant called B+ tree). This structure keeps data ordered while allowing efficient insert, delete, and search operations. When executing a query, the query optimizer evaluates available indexes and selects the most efficient index path.

The working principle of an index can be compared to a book's table of contents: instead of flipping through the entire book page by page, you can quickly locate a specific chapter using the table of contents. For example, for a query like { age: 25 }, MongoDB first checks if there is an index on the age field. If so, it uses the index to quickly locate the positions of all documents where age = 25 and then retrieves those documents directly.

The B-tree index is characterized by maintaining data balance, ensuring that the path length from the root node to any leaf node is the same. Each node contains multiple keys and pointers, significantly reducing disk I/O operations. For range queries (e.g., { age: { $gt: 20 } }), the B-tree can efficiently locate the first matching key and then sequentially traverse subsequent keys.

Types of Indexes

MongoDB supports various types of indexes to accommodate different scenarios. A single-field index is the most basic type, created on a single field. A compound index is created on multiple fields, and the order of fields significantly impacts index efficiency. For example, an index created with db.users.createIndex({ lastName: 1, firstName: 1 }) is effective for queries on lastName or combined queries on lastName and firstName, but ineffective for queries on firstName alone.

Multikey indexes are used for array fields, creating index entries for each element in the array. Geospatial indexes support location-based queries, and full-text indexes support text search functionality. Hash indexes convert field values into hash values, making them suitable for equality queries but not for range queries.

// Multikey index example
db.products.createIndex({ tags: 1 });

// Geospatial index
db.places.createIndex({ location: "2dsphere" });

// Text index
db.articles.createIndex({ content: "text" });

Storage Structure of Indexes

MongoDB indexes are stored on disk in a B-tree structure, separate from data files. Each index entry contains the indexed field's value and a pointer to the corresponding document (usually the document ID). For large collections, indexes can occupy a significant amount of storage space.

Index entries are stored sorted by field values, making range queries and sorting operations highly efficient. For compound indexes, combinations of field values are sorted according to the order defined in the index. For example, an index { a: 1, b: -1 } is first sorted by a in ascending order, and for documents with the same a value, they are then sorted by b in descending order.

The WiredTiger storage engine (MongoDB's default engine) uses compression techniques to reduce index size. Index pages are typically cached in memory, and hot indexes can reside entirely in memory, greatly improving query speed. Detailed storage information about indexes can be viewed using db.collection.stats().

Index Selection and Optimization

Choosing the right index requires considering query patterns. The explain() method can be used to analyze how a query utilizes indexes. A covered query is one where the query only needs to use the index without accessing the actual documents, making it the most efficient. For example, if an index includes all the fields in the query, MongoDB can return results directly from the index.

More indexes are not always better, as each index adds overhead to write operations. Inserting a document requires updating all related indexes, and updating indexed fields also triggers index updates. For collections with heavy write and light read operations, indexes should be added cautiously. Monitoring tools like mongotop and mongostat can help evaluate index usage.

// Analyze query execution plan
db.users.find({ age: { $gt: 30 } }).explain("executionStats");

// Force the use of a specific index
db.users.find({ username: "john", age: 30 }).hint({ username: 1 });

Limitations of Indexes

Although indexes are powerful, they have limitations. Indexes consume additional storage space, which can be a significant proportion of the data size for large collections. Indexes also degrade write performance, as every insert, update, or delete operation requires updating all related indexes.

Some queries cannot effectively use indexes, such as regular expression queries (non-prefix matches), negation queries ($ne, $not), etc. Index selectivity is also important; indexes on low-selectivity fields (e.g., gender) are less effective. When indexed fields are frequently updated, the overhead of maintaining the index may outweigh the query benefits.

The efficiency of index usage for certain operators like $exists and $type depends on the specific scenario. Multikey indexes on array fields can lead to index bloat, especially when arrays contain many elements. Index fragmentation can also degrade performance over time, requiring regular maintenance.

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.