Data grouping and statistics ($group, $sum, $avg, etc.)
MongoDB's aggregation framework provides powerful data processing capabilities, with the $group
stage being one of its core operations. By combining it with accumulators like $sum
and $avg
, complex data grouping and statistical tasks can be efficiently accomplished. Flexible use of these operators enables common analytical needs such as data summarization, average value calculation, and maximum value extraction.
Basics of Data Grouping ($group
)
The $group
stage defines grouping criteria by specifying the _id
field, which groups input documents based on its value. After grouping, accumulators can be used to perform statistical calculations on each group. For example, grouping by city and counting the number of documents per city:
db.sales.aggregate([
{
$group: {
_id: "$city",
count: { $sum: 1 }
}
}
])
This operation produces results like:
{ "_id" : "Beijing", "count" : 42 }
{ "_id" : "Shanghai", "count" : 38 }
Common Accumulator Operators
Sum Calculation ($sum
)
$sum
supports numerical accumulation, allowing both document counting (by passing a constant 1) and summing specified fields:
// Calculate total sales by product category
db.orders.aggregate([
{
$group: {
_id: "$category",
totalSales: { $sum: "$amount" },
orderCount: { $sum: 1 }
}
}
])
Average Calculation ($avg
)
Calculates the average of numerical fields within a group, automatically ignoring non-numeric types:
// Calculate average salary by department
db.employees.aggregate([
{
$group: {
_id: "$department",
avgSalary: { $avg: "$salary" }
}
}
])
Extremes Extraction ($max
/$min
)
Retrieves extreme values within a group, with support for comparing dates, strings, etc.:
// Find the earliest sale date for each product
db.sales.aggregate([
{
$group: {
_id: "$productId",
firstSaleDate: { $min: "$saleDate" }
}
}
])
Composite Grouping and Advanced Techniques
Multi-Field Grouping
_id
can accept an object for multi-level grouping, such as grouping by year and month:
db.orders.aggregate([
{
$group: {
_id: {
year: { $year: "$orderDate" },
month: { $month: "$orderDate" }
},
total: { $sum: "$amount" }
}
}
])
Array Operations ($push
/$addToSet
)
Collects specific field values within a group, with $addToSet
automatically deduplicating:
// Collect product IDs purchased by each customer
db.orders.aggregate([
{
$group: {
_id: "$customerId",
purchasedItems: { $addToSet: "$productId" }
}
}
])
Performance Optimization Practices
Index Utilization Strategy
When using $match
and $sort
stages before $group
, ensure these operations leverage indexes:
db.sales.aggregate([
{ $match: { status: "completed" } }, // Uses status index
{ $sort: { productId: 1 } }, // Uses productId index
{
$group: {
_id: "$productId",
total: { $sum: "$quantity" }
}
}
])
Memory Control
For large collections, the allowDiskUse
option may be required:
db.largeCollection.aggregate(
[/* pipeline stages */],
{ allowDiskUse: true }
)
Real-World Business Scenarios
E-Commerce Data Analysis
Calculate each user's purchase frequency and average order value over the last 3 months:
db.orders.aggregate([
{
$match: {
orderDate: { $gte: new Date("2023-01-01") }
}
},
{
$group: {
_id: "$userId",
orderCount: { $sum: 1 },
avgAmount: { $avg: "$total" },
lastPurchase: { $max: "$orderDate" }
}
},
{
$project: {
userId: "$_id",
_id: 0,
orderCount: 1,
avgAmount: { $round: ["$avgAmount", 2] },
daysSinceLastPurchase: {
$divide: [
{ $subtract: [new Date(), "$lastPurchase"] },
86400000
]
}
}
}
])
IoT Device Monitoring
Calculate hourly average sensor readings for each device:
db.sensorReadings.aggregate([
{
$project: {
deviceId: 1,
value: 1,
hour: { $hour: "$timestamp" }
}
},
{
$group: {
_id: {
device: "$deviceId",
hour: "$hour"
},
avgValue: { $avg: "$value" },
minValue: { $min: "$value" },
maxValue: { $max: "$value" }
}
}
])
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn
下一篇:多表关联($lookup)