Common aggregation stages ($match, $project, $group, $sort, etc.)

Author：Chuan Chen 阅读数：63245人阅读分类： MongoDB

MongoDB's aggregation pipeline is a powerful data processing tool that allows for complex transformations and analysis of documents in a collection by chaining multiple stages together. Aggregation stages are the core components of the pipeline, where each stage processes the input documents and passes the results to the next stage. Below are several commonly used aggregation stages and their typical applications.

$match

$match is a commonly used filtering stage in the aggregation pipeline, similar to the find method in queries. It allows us to filter documents based on conditions, passing only those that meet the criteria to the next stage. $match should be placed as early as possible in the pipeline to reduce the amount of data processed in subsequent stages.

db.orders.aggregate([
  { $match: { status: "completed", total: { $gt: 100 } } }
])

This example filters orders where status is "completed" and total is greater than 100. $match supports all standard query operators, such as $eq, $gt, $in, etc.

$project

$project is used to reshape document structures, allowing the inclusion or exclusion of fields or the creation of new computed fields. It is one of the most flexible stages in the aggregation pipeline and is often used for data projection and transformation.

db.users.aggregate([
  {
    $project: {
      name: 1,
      email: 1,
      yearOfBirth: { $year: "$birthDate" },
      fullName: { $concat: ["$firstName", " ", "$lastName"] }
    }
  }
])

This example extracts the name and email fields from the users collection while computing yearOfBirth and concatenating fullName. $project also supports conditional expressions (e.g., $cond) and array operations (e.g., $slice).

$group

$group is the core aggregation stage, used to group documents by specified fields and compute aggregated values. It can be used with accumulators like $sum, $avg, $max, and $min.

db.sales.aggregate([
  {
    $group: {
      _id: "$product",
      totalSold: { $sum: "$quantity" },
      averagePrice: { $avg: "$price" },
      maxQuantity: { $max: "$quantity" }
    }
  }
])

This example groups by product, calculating the total quantity sold, average price, and maximum single-sale quantity for each product. The _id field in $group determines the grouping criteria, which can be a single field, a combination of fields, or a computed expression.

$sort

$sort sorts documents based on one or more fields, specifying ascending (1) or descending (-1) order. It typically appears later in the pipeline to sort the final results.

db.employees.aggregate([
  { $sort: { department: 1, salary: -1 } }
])

This example sorts first by department in ascending order and then by salary in descending order. $sort can consume significant memory, especially with large datasets, and may require the allowDiskUse option.

$limit and $skip

$limit restricts the number of output documents, while $skip skips a specified number of documents. They are often used together to implement pagination.

db.products.aggregate([
  { $skip: 20 },
  { $limit: 10 }
])

This example skips the first 20 documents and returns the next 10. Note that the order of $skip and $limit affects the results.

$unwind

$unwind expands array fields, creating a separate document for each element in the array. It is an essential tool for working with array data.

db.blogPosts.aggregate([
  { $unwind: "$comments" }
])

This example expands the comments array, generating an independent document for each comment. $unwind can also handle empty arrays or missing fields with the preserveNullAndEmptyArrays option.

$lookup

$lookup implements a left outer join similar to SQL, querying and merging data from other collections.

db.orders.aggregate([
  {
    $lookup: {
      from: "products",
      localField: "productId",
      foreignField: "_id",
      as: "productDetails"
    }
  }
])

This example joins the orders collection with products using productId as the matching field, storing the results in the productDetails array. $lookup also supports complex join operations using pipelines.

$addFields and $set

$addFields and $set (MongoDB 4.2+) add new fields or modify existing ones without altering other fields. They are more concise than $project and are suitable for incremental modifications.

db.students.aggregate([
  {
    $addFields: {
      finalGrade: { $avg: ["$test1", "$test2", "$test3"] },
      passed: { $gte: [{ $avg: ["$test1", "$test2", "$test3"] }, 60] }
    }
  }
])

This example calculates the average grade and adds a flag indicating whether the student passed. $set is an alias for $addFields and functions identically.

$count

$count returns the number of documents in the current pipeline stage, often used to measure the size of a result set.

db.customers.aggregate([
  { $match: { city: "New York" } },
  { $count: "nyCustomers" }
])

This example counts the number of customers in New York, producing a result like { nyCustomers: 123 }. $count is a shortcut for a combination of $group and $project.

$facet

$facet allows executing multiple sub-pipelines within a single aggregation pipeline, generating multiple sets of results. It is ideal for scenarios requiring simultaneous computation of various aggregation metrics.

db.sales.aggregate([
  {
    $facet: {
      "totalSales": [{ $group: { _id: null, total: { $sum: "$amount" } } }],
      "topProducts": [
        { $group: { _id: "$product", count: { $sum: 1 } } },
        { $sort: { count: -1 } },
        { $limit: 5 }
      ],
      "monthlyTrend": [
        { $project: { month: { $month: "$date" } } },
        { $group: { _id: "$month", count: { $sum: 1 } } }
      ]
    }
  }
])

This example simultaneously calculates total sales, the top 5 best-selling products, and monthly sales trends. The output of $facet is a document containing the results of all sub-pipelines.

$bucket and $bucketAuto

$bucket and $bucketAuto group documents into specified range intervals (buckets). $bucket requires explicit boundary definitions, while $bucketAuto automatically determines boundaries.

db.products.aggregate([
  {
    $bucket: {
      groupBy: "$price",
      boundaries: [0, 50, 100, 200, 500],
      default: "Other",
      output: {
        count: { $sum: 1 },
        averagePrice: { $avg: "$price" }
      }
    }
  }
])

This example groups products by price range, counting the number of products and calculating the average price for each range. $bucketAuto is particularly useful when the data distribution is unknown.

$graphLookup

$graphLookup performs recursive queries, often used for hierarchical or graph-like data, such as organizational structures or social networks.

db.employees.aggregate([
  {
    $graphLookup: {
      from: "employees",
      startWith: "$reportsTo",
      connectFromField: "reportsTo",
      connectToField: "_id",
      as: "managementChain",
      maxDepth: 3
    }
  }
])

This example finds the management chain (up to 3 levels) for each employee. $graphLookup supports depth control (maxDepth) and conditional filtering (restrictSearchWithMatch).

$merge and $out

$merge (MongoDB 4.2+) and $out write aggregation results to a collection. $out completely replaces the target collection, while $merge offers more flexible write options (merge, replace, retain, etc.).

db.sales.aggregate([
  { $group: { _id: "$product", total: { $sum: "$amount" } } },
  { $merge: { into: "productTotals", on: "_id", whenMatched: "merge" } }
])

This example calculates total sales by product and merges the results into the productTotals collection. $merge supports complex conflict resolution strategies.

做个网站！

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱：cc@cccx.cn

下一篇：表达式与运算符（算术、比较、日期、字符串等）