阿里云主机折上折
  • 微信号
Current Site:Index > Aggregation operations (count, distinct)

Aggregation operations (count, distinct)

Author:Chuan Chen 阅读数:43795人阅读 分类: MongoDB

Count in Aggregation Operations

count is one of the most fundamental aggregation operations in MongoDB, used to count the number of documents in a collection that meet specified criteria. This operation is particularly useful for data analysis and report generation. The basic syntax is as follows:

db.collection.count(query)

The query parameter is optional and used to specify filtering conditions. For example, to count all documents in the users collection:

db.users.count()

To count the number of users older than 30:

db.users.count({age: {$gt: 30}})

After MongoDB 4.0, the count() method was deprecated, and it is recommended to use countDocuments() or estimatedDocumentCount() instead:

// Precise count, supports query conditions
db.users.countDocuments({age: {$gt: 30}})

// Fast estimation, but does not support query conditions
db.users.estimatedDocumentCount()

Distinct Operation

The distinct operation is used to retrieve all unique values of a specific field in a collection. This is practical when you need to obtain a list of categories or enumerated values. The basic syntax is:

db.collection.distinct(field, query)

For example, to get all distinct cities in the users collection:

db.users.distinct("address.city")

You can also add query conditions, such as retrieving distinct cities where users older than 30 reside:

db.users.distinct("address.city", {age: {$gt: 30}})

The distinct operation returns an array containing all unique values. For large collections, this operation may consume significant resources.

Count in Aggregation Pipeline

In more complex aggregation scenarios, you can use the $count stage in the aggregation pipeline. This approach allows combining with other aggregation stages:

db.users.aggregate([
  {$match: {age: {$gt: 30}}},
  {$count: "over30Count"}
])

This example will return a result like:

{"over30Count": 42}

The $count stage must be placed at the end of the pipeline because it consumes all input documents and outputs a single document.

Distinct in Aggregation Pipeline

Although distinct can be used as a standalone operation, similar functionality in the aggregation pipeline is typically achieved using the $group stage:

db.users.aggregate([
  {$group: {_id: "$address.city"}}
])

This method is more flexible than distinct and can be combined with other aggregation stages. For example, to count the number of users in each city:

db.users.aggregate([
  {$group: {
    _id: "$address.city",
    userCount: {$sum: 1}
  }}
])

Performance Considerations

The performance of count and distinct operations is influenced by several factors:

  1. Index Usage: Ensure query fields have appropriate indexes.
  2. Collection Size: Large collections may require more time.
  3. Sharded Clusters: Operations across shards incur additional overhead.

For countDocuments(), MongoDB executes a full query plan, while estimatedDocumentCount() uses collection metadata, which is faster but less precise.

The distinct operation builds all unique values in memory, which may consume significant memory for high-cardinality fields. An alternative is to use the $group stage in the aggregation pipeline with the allowDiskUse option.

Practical Application Examples

Assume an e-commerce order collection orders with fields: orderDate, customerId, products (array), totalAmount, etc.

Count the number of orders in 2023:

db.orders.countDocuments({
  orderDate: {
    $gte: ISODate("2023-01-01"),
    $lt: ISODate("2024-01-01")
  }
})

Retrieve all customer IDs who have purchased products in a specific category:

db.orders.distinct("customerId", {
  "products.category": "Electronics"
})

Use the aggregation pipeline to count monthly orders and unique customers:

db.orders.aggregate([
  {$match: {orderDate: {$gte: ISODate("2023-01-01")}}},
  {$project: {
    month: {$month: "$orderDate"},
    customerId: 1
  }},
  {$group: {
    _id: "$month",
    orderCount: {$sum: 1},
    uniqueCustomers: {$addToSet: "$customerId"}
  }},
  {$project: {
    month: "$_id",
    orderCount: 1,
    customerCount: {$size: "$uniqueCustomers"}
  }}
])

Advanced Usage

For more complex requirements, you can combine multiple aggregation stages:

  1. Start with $match to filter documents.
  2. Use $unwind to expand arrays.
  3. Apply $group for grouping and calculations.
  4. Finally, use $sort to order results.

For example, to calculate sales and order counts by product category:

db.orders.aggregate([
  {$unwind: "$products"},
  {$group: {
    _id: "$products.category",
    totalSales: {$sum: "$products.price"},
    orderCount: {$sum: 1}
  }},
  {$sort: {totalSales: -1}}
])

Limitations and Alternatives

count and distinct have some limitations:

  1. Result set size limit (16MB).
  2. Memory usage constraints.
  3. Performance issues in sharded clusters.

Alternatives include:

  1. Using the $facet stage in the aggregation pipeline to execute multiple operations in parallel.
  2. Using $sample for statistical sampling.
  3. Using Map-Reduce for very large datasets.

For example, using $facet to compute multiple statistics simultaneously:

db.orders.aggregate([
  {$match: {status: "completed"}},
  {$facet: {
    "totalOrders": [{$count: "count"}],
    "byMonth": [
      {$group: {
        _id: {$month: "$orderDate"},
        count: {$sum: 1}
      }}
    ],
    "topCustomers": [
      {$group: {
        _id: "$customerId",
        orderCount: {$sum: 1}
      }},
      {$sort: {orderCount: -1}},
      {$limit: 5}
    ]
  }}
])

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.