Aggregation operations (count, distinct)
Count in Aggregation Operations
count is one of the most fundamental aggregation operations in MongoDB, used to count the number of documents in a collection that meet specified criteria. This operation is particularly useful for data analysis and report generation. The basic syntax is as follows:
db.collection.count(query)
The query
parameter is optional and used to specify filtering conditions. For example, to count all documents in the users
collection:
db.users.count()
To count the number of users older than 30:
db.users.count({age: {$gt: 30}})
After MongoDB 4.0, the count()
method was deprecated, and it is recommended to use countDocuments()
or estimatedDocumentCount()
instead:
// Precise count, supports query conditions
db.users.countDocuments({age: {$gt: 30}})
// Fast estimation, but does not support query conditions
db.users.estimatedDocumentCount()
Distinct Operation
The distinct
operation is used to retrieve all unique values of a specific field in a collection. This is practical when you need to obtain a list of categories or enumerated values. The basic syntax is:
db.collection.distinct(field, query)
For example, to get all distinct cities in the users
collection:
db.users.distinct("address.city")
You can also add query conditions, such as retrieving distinct cities where users older than 30 reside:
db.users.distinct("address.city", {age: {$gt: 30}})
The distinct
operation returns an array containing all unique values. For large collections, this operation may consume significant resources.
Count in Aggregation Pipeline
In more complex aggregation scenarios, you can use the $count
stage in the aggregation pipeline. This approach allows combining with other aggregation stages:
db.users.aggregate([
{$match: {age: {$gt: 30}}},
{$count: "over30Count"}
])
This example will return a result like:
{"over30Count": 42}
The $count
stage must be placed at the end of the pipeline because it consumes all input documents and outputs a single document.
Distinct in Aggregation Pipeline
Although distinct
can be used as a standalone operation, similar functionality in the aggregation pipeline is typically achieved using the $group
stage:
db.users.aggregate([
{$group: {_id: "$address.city"}}
])
This method is more flexible than distinct
and can be combined with other aggregation stages. For example, to count the number of users in each city:
db.users.aggregate([
{$group: {
_id: "$address.city",
userCount: {$sum: 1}
}}
])
Performance Considerations
The performance of count
and distinct
operations is influenced by several factors:
- Index Usage: Ensure query fields have appropriate indexes.
- Collection Size: Large collections may require more time.
- Sharded Clusters: Operations across shards incur additional overhead.
For countDocuments()
, MongoDB executes a full query plan, while estimatedDocumentCount()
uses collection metadata, which is faster but less precise.
The distinct
operation builds all unique values in memory, which may consume significant memory for high-cardinality fields. An alternative is to use the $group
stage in the aggregation pipeline with the allowDiskUse
option.
Practical Application Examples
Assume an e-commerce order collection orders
with fields: orderDate
, customerId
, products
(array), totalAmount
, etc.
Count the number of orders in 2023:
db.orders.countDocuments({
orderDate: {
$gte: ISODate("2023-01-01"),
$lt: ISODate("2024-01-01")
}
})
Retrieve all customer IDs who have purchased products in a specific category:
db.orders.distinct("customerId", {
"products.category": "Electronics"
})
Use the aggregation pipeline to count monthly orders and unique customers:
db.orders.aggregate([
{$match: {orderDate: {$gte: ISODate("2023-01-01")}}},
{$project: {
month: {$month: "$orderDate"},
customerId: 1
}},
{$group: {
_id: "$month",
orderCount: {$sum: 1},
uniqueCustomers: {$addToSet: "$customerId"}
}},
{$project: {
month: "$_id",
orderCount: 1,
customerCount: {$size: "$uniqueCustomers"}
}}
])
Advanced Usage
For more complex requirements, you can combine multiple aggregation stages:
- Start with
$match
to filter documents. - Use
$unwind
to expand arrays. - Apply
$group
for grouping and calculations. - Finally, use
$sort
to order results.
For example, to calculate sales and order counts by product category:
db.orders.aggregate([
{$unwind: "$products"},
{$group: {
_id: "$products.category",
totalSales: {$sum: "$products.price"},
orderCount: {$sum: 1}
}},
{$sort: {totalSales: -1}}
])
Limitations and Alternatives
count
and distinct
have some limitations:
- Result set size limit (16MB).
- Memory usage constraints.
- Performance issues in sharded clusters.
Alternatives include:
- Using the
$facet
stage in the aggregation pipeline to execute multiple operations in parallel. - Using
$sample
for statistical sampling. - Using Map-Reduce for very large datasets.
For example, using $facet
to compute multiple statistics simultaneously:
db.orders.aggregate([
{$match: {status: "completed"}},
{$facet: {
"totalOrders": [{$count: "count"}],
"byMonth": [
{$group: {
_id: {$month: "$orderDate"},
count: {$sum: 1}
}}
],
"topCustomers": [
{$group: {
_id: "$customerId",
orderCount: {$sum: 1}
}},
{$sort: {orderCount: -1}},
{$limit: 5}
]
}}
])
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn
下一篇:Vite.js核心知识点