Common aggregation stages ($match, $project, $group, $sort, etc.)
MongoDB's aggregation pipeline is a powerful data processing tool that allows for complex transformations and analysis of documents in a collection by chaining multiple stages together. Aggregation stages are the core components of the pipeline, where each stage processes the input documents and passes the results to the next stage. Below are several commonly used aggregation stages and their typical applications.
$match
$match
is a commonly used filtering stage in the aggregation pipeline, similar to the find
method in queries. It allows us to filter documents based on conditions, passing only those that meet the criteria to the next stage. $match
should be placed as early as possible in the pipeline to reduce the amount of data processed in subsequent stages.
db.orders.aggregate([
{ $match: { status: "completed", total: { $gt: 100 } } }
])
This example filters orders where status
is "completed" and total
is greater than 100. $match
supports all standard query operators, such as $eq
, $gt
, $in
, etc.
$project
$project
is used to reshape document structures, allowing the inclusion or exclusion of fields or the creation of new computed fields. It is one of the most flexible stages in the aggregation pipeline and is often used for data projection and transformation.
db.users.aggregate([
{
$project: {
name: 1,
email: 1,
yearOfBirth: { $year: "$birthDate" },
fullName: { $concat: ["$firstName", " ", "$lastName"] }
}
}
])
This example extracts the name
and email
fields from the users
collection while computing yearOfBirth
and concatenating fullName
. $project
also supports conditional expressions (e.g., $cond
) and array operations (e.g., $slice
).
$group
$group
is the core aggregation stage, used to group documents by specified fields and compute aggregated values. It can be used with accumulators like $sum
, $avg
, $max
, and $min
.
db.sales.aggregate([
{
$group: {
_id: "$product",
totalSold: { $sum: "$quantity" },
averagePrice: { $avg: "$price" },
maxQuantity: { $max: "$quantity" }
}
}
])
This example groups by product
, calculating the total quantity sold, average price, and maximum single-sale quantity for each product. The _id
field in $group
determines the grouping criteria, which can be a single field, a combination of fields, or a computed expression.
$sort
$sort
sorts documents based on one or more fields, specifying ascending (1) or descending (-1) order. It typically appears later in the pipeline to sort the final results.
db.employees.aggregate([
{ $sort: { department: 1, salary: -1 } }
])
This example sorts first by department
in ascending order and then by salary
in descending order. $sort
can consume significant memory, especially with large datasets, and may require the allowDiskUse
option.
$limit and $skip
$limit
restricts the number of output documents, while $skip
skips a specified number of documents. They are often used together to implement pagination.
db.products.aggregate([
{ $skip: 20 },
{ $limit: 10 }
])
This example skips the first 20 documents and returns the next 10. Note that the order of $skip
and $limit
affects the results.
$unwind
$unwind
expands array fields, creating a separate document for each element in the array. It is an essential tool for working with array data.
db.blogPosts.aggregate([
{ $unwind: "$comments" }
])
This example expands the comments
array, generating an independent document for each comment. $unwind
can also handle empty arrays or missing fields with the preserveNullAndEmptyArrays
option.
$lookup
$lookup
implements a left outer join similar to SQL, querying and merging data from other collections.
db.orders.aggregate([
{
$lookup: {
from: "products",
localField: "productId",
foreignField: "_id",
as: "productDetails"
}
}
])
This example joins the orders
collection with products
using productId
as the matching field, storing the results in the productDetails
array. $lookup
also supports complex join operations using pipelines.
$addFields and $set
$addFields
and $set
(MongoDB 4.2+) add new fields or modify existing ones without altering other fields. They are more concise than $project
and are suitable for incremental modifications.
db.students.aggregate([
{
$addFields: {
finalGrade: { $avg: ["$test1", "$test2", "$test3"] },
passed: { $gte: [{ $avg: ["$test1", "$test2", "$test3"] }, 60] }
}
}
])
This example calculates the average grade and adds a flag indicating whether the student passed. $set
is an alias for $addFields
and functions identically.
$count
$count
returns the number of documents in the current pipeline stage, often used to measure the size of a result set.
db.customers.aggregate([
{ $match: { city: "New York" } },
{ $count: "nyCustomers" }
])
This example counts the number of customers in New York, producing a result like { nyCustomers: 123 }
. $count
is a shortcut for a combination of $group
and $project
.
$facet
$facet
allows executing multiple sub-pipelines within a single aggregation pipeline, generating multiple sets of results. It is ideal for scenarios requiring simultaneous computation of various aggregation metrics.
db.sales.aggregate([
{
$facet: {
"totalSales": [{ $group: { _id: null, total: { $sum: "$amount" } } }],
"topProducts": [
{ $group: { _id: "$product", count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 5 }
],
"monthlyTrend": [
{ $project: { month: { $month: "$date" } } },
{ $group: { _id: "$month", count: { $sum: 1 } } }
]
}
}
])
This example simultaneously calculates total sales, the top 5 best-selling products, and monthly sales trends. The output of $facet
is a document containing the results of all sub-pipelines.
$bucket and $bucketAuto
$bucket
and $bucketAuto
group documents into specified range intervals (buckets). $bucket
requires explicit boundary definitions, while $bucketAuto
automatically determines boundaries.
db.products.aggregate([
{
$bucket: {
groupBy: "$price",
boundaries: [0, 50, 100, 200, 500],
default: "Other",
output: {
count: { $sum: 1 },
averagePrice: { $avg: "$price" }
}
}
}
])
This example groups products by price range, counting the number of products and calculating the average price for each range. $bucketAuto
is particularly useful when the data distribution is unknown.
$graphLookup
$graphLookup
performs recursive queries, often used for hierarchical or graph-like data, such as organizational structures or social networks.
db.employees.aggregate([
{
$graphLookup: {
from: "employees",
startWith: "$reportsTo",
connectFromField: "reportsTo",
connectToField: "_id",
as: "managementChain",
maxDepth: 3
}
}
])
This example finds the management chain (up to 3 levels) for each employee. $graphLookup
supports depth control (maxDepth
) and conditional filtering (restrictSearchWithMatch
).
$merge and $out
$merge
(MongoDB 4.2+) and $out
write aggregation results to a collection. $out
completely replaces the target collection, while $merge
offers more flexible write options (merge, replace, retain, etc.).
db.sales.aggregate([
{ $group: { _id: "$product", total: { $sum: "$amount" } } },
{ $merge: { into: "productTotals", on: "_id", whenMatched: "merge" } }
])
This example calculates total sales by product and merges the results into the productTotals
collection. $merge
supports complex conflict resolution strategies.
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn
上一篇:目录操作