阿里云主机折上折
  • 微信号
Current Site:Index > Data grouping and statistics ($group, $sum, $avg, etc.)

Data grouping and statistics ($group, $sum, $avg, etc.)

Author:Chuan Chen 阅读数:31161人阅读 分类: MongoDB

MongoDB's aggregation framework provides powerful data processing capabilities, with the $group stage being one of its core operations. By combining it with accumulators like $sum and $avg, complex data grouping and statistical tasks can be efficiently accomplished. Flexible use of these operators enables common analytical needs such as data summarization, average value calculation, and maximum value extraction.

Basics of Data Grouping ($group)

The $group stage defines grouping criteria by specifying the _id field, which groups input documents based on its value. After grouping, accumulators can be used to perform statistical calculations on each group. For example, grouping by city and counting the number of documents per city:

db.sales.aggregate([
  {
    $group: {
      _id: "$city",
      count: { $sum: 1 }
    }
  }
])

This operation produces results like:

{ "_id" : "Beijing", "count" : 42 }
{ "_id" : "Shanghai", "count" : 38 }

Common Accumulator Operators

Sum Calculation ($sum)

$sum supports numerical accumulation, allowing both document counting (by passing a constant 1) and summing specified fields:

// Calculate total sales by product category
db.orders.aggregate([
  {
    $group: {
      _id: "$category",
      totalSales: { $sum: "$amount" },
      orderCount: { $sum: 1 }
    }
  }
])

Average Calculation ($avg)

Calculates the average of numerical fields within a group, automatically ignoring non-numeric types:

// Calculate average salary by department
db.employees.aggregate([
  {
    $group: {
      _id: "$department",
      avgSalary: { $avg: "$salary" }
    }
  }
])

Extremes Extraction ($max/$min)

Retrieves extreme values within a group, with support for comparing dates, strings, etc.:

// Find the earliest sale date for each product
db.sales.aggregate([
  {
    $group: {
      _id: "$productId",
      firstSaleDate: { $min: "$saleDate" }
    }
  }
])

Composite Grouping and Advanced Techniques

Multi-Field Grouping

_id can accept an object for multi-level grouping, such as grouping by year and month:

db.orders.aggregate([
  {
    $group: {
      _id: {
        year: { $year: "$orderDate" },
        month: { $month: "$orderDate" }
      },
      total: { $sum: "$amount" }
    }
  }
])

Array Operations ($push/$addToSet)

Collects specific field values within a group, with $addToSet automatically deduplicating:

// Collect product IDs purchased by each customer
db.orders.aggregate([
  {
    $group: {
      _id: "$customerId",
      purchasedItems: { $addToSet: "$productId" }
    }
  }
])

Performance Optimization Practices

Index Utilization Strategy

When using $match and $sort stages before $group, ensure these operations leverage indexes:

db.sales.aggregate([
  { $match: { status: "completed" } },  // Uses status index
  { $sort: { productId: 1 } },         // Uses productId index
  {
    $group: {
      _id: "$productId",
      total: { $sum: "$quantity" }
    }
  }
])

Memory Control

For large collections, the allowDiskUse option may be required:

db.largeCollection.aggregate(
  [/* pipeline stages */],
  { allowDiskUse: true }
)

Real-World Business Scenarios

E-Commerce Data Analysis

Calculate each user's purchase frequency and average order value over the last 3 months:

db.orders.aggregate([
  {
    $match: {
      orderDate: { $gte: new Date("2023-01-01") }
    }
  },
  {
    $group: {
      _id: "$userId",
      orderCount: { $sum: 1 },
      avgAmount: { $avg: "$total" },
      lastPurchase: { $max: "$orderDate" }
    }
  },
  {
    $project: {
      userId: "$_id",
      _id: 0,
      orderCount: 1,
      avgAmount: { $round: ["$avgAmount", 2] },
      daysSinceLastPurchase: {
        $divide: [
          { $subtract: [new Date(), "$lastPurchase"] },
          86400000
        ]
      }
    }
  }
])

IoT Device Monitoring

Calculate hourly average sensor readings for each device:

db.sensorReadings.aggregate([
  {
    $project: {
      deviceId: 1,
      value: 1,
      hour: { $hour: "$timestamp" }
    }
  },
  {
    $group: {
      _id: {
        device: "$deviceId",
        hour: "$hour"
      },
      avgValue: { $avg: "$value" },
      minValue: { $min: "$value" },
      maxValue: { $max: "$value" }
    }
  }
])

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.