The window function ($setWindowFields)
Window Functions ($setWindowFields)
Window functions are powerful aggregation operators introduced in MongoDB 5.0, allowing access to adjacent document data when performing calculations on document collections. Unlike traditional aggregation operations, window functions can partition and sort data based on defined window frames and then perform calculations within each window.
Basic Syntax Structure
The basic syntax of $setWindowFields
is as follows:
{
$setWindowFields: {
partitionBy: <expression>, // Partition field
sortBy: { <field1>: <sortOrder>, ... }, // Sorting rules
output: { // Output field definitions
<outputField1>: {
<windowOperator>: <expression>,
window: {
documents: [ <lowerBound>, <upperBound> ], // Document window
range: [ <lowerBound>, <upperBound> ], // Value range window
unit: <timeUnit> // Time unit
}
},
...
}
}
}
Partitioning and Sorting
Partitioning (partitionBy
) divides data into multiple groups, with window functions performing independent calculations within each partition. Sorting (sortBy
) determines the processing order of documents within a partition:
// Partition by department and sort by salary in descending order
db.employees.aggregate([
{
$setWindowFields: {
partitionBy: "$department",
sortBy: { salary: -1 },
output: {
rank: {
$rank: {},
window: { documents: [ "unbounded", "current" ] }
}
}
}
}
])
Window Range Definitions
Window ranges determine which documents are included in calculations:
-
Document Window: Based on document position
["unbounded", "unbounded"]
: The entire partition["unbounded", "current"]
: From the start of the partition to the current row[ -1, 1 ]
: One row before and after the current row
-
Value Range Window: Based on field values
[ -100, 100 ]
: Documents within ±100 of the current value
// Calculate the average sales amount over a 3-month sliding window
db.sales.aggregate([
{
$setWindowFields: {
partitionBy: "$product",
sortBy: { date: 1 },
output: {
threeMonthAvg: {
$avg: "$amount",
window: {
range: [ -3, 0 ],
unit: "month"
}
}
}
}
}
])
Common Window Functions
Ranking Functions
$rank
: Standard ranking (skips subsequent ranks for ties)$denseRank
: Dense ranking (does not skip ranks for ties)$rowNumber
: Simple row number
// Calculate salary rankings within departments
db.employees.aggregate([
{
$setWindowFields: {
partitionBy: "$department",
sortBy: { salary: -1 },
output: {
rank: { $rank: {} },
denseRank: { $denseRank: {} },
rowNumber: { $rowNumber: {} }
}
}
}
])
Aggregation Functions
Standard aggregation functions like $sum
, $avg
, $min
, $max
, etc., can all be used as window functions:
// Calculate cumulative sales for each product
db.sales.aggregate([
{
$setWindowFields: {
partitionBy: "$product",
sortBy: { date: 1 },
output: {
cumulativeSales: {
$sum: "$amount",
window: { documents: [ "unbounded", "current" ] }
}
}
}
}
])
Offset Functions
$shift
: Access rows relative to the current row$first
/$last
: Get the first/last value within the window
// Calculate month-over-month sales growth rate
db.monthly_sales.aggregate([
{
$setWindowFields: {
sortBy: { yearMonth: 1 },
output: {
prevMonthSales: {
$shift: {
output: "$amount",
by: -1,
default: null
}
},
growthRate: {
$divide: [
{ $subtract: [ "$amount", { $shift: { output: "$amount", by: -1 } } ] },
{ $shift: { output: "$amount", by: -1 } }
]
}
}
}
}
])
Advanced Application Examples
Moving Average Calculation
// Calculate 7-day moving average temperature
db.temperature_readings.aggregate([
{
$setWindowFields: {
partitionBy: "$sensor_id",
sortBy: { timestamp: 1 },
output: {
sevenDayAvg: {
$avg: "$temperature",
window: {
range: [ -6, 0 ], // Current day + previous 6 days
unit: "day"
}
}
}
}
}
])
Session Segmentation
// Segment user sessions based on 30 minutes of inactivity
db.user_clicks.aggregate([
{
$setWindowFields: {
partitionBy: "$user_id",
sortBy: { timestamp: 1 },
output: {
newSession: {
$function: {
body: function(timestamp, prevTimestamp) {
return (!prevTimestamp ||
(timestamp - prevTimestamp) > 30*60*1000) ? 1 : 0;
},
args: ["$timestamp", { $shift: { output: "$timestamp", by: -1 } }],
lang: "js"
}
},
sessionId: {
$sum: {
$function: {
body: function(timestamp, prevTimestamp) {
return (!prevTimestamp ||
(timestamp - prevTimestamp) > 30*60*1000) ? 1 : 0;
},
args: ["$timestamp", { $shift: { output: "$timestamp", by: -1 } }],
lang: "js"
}
},
window: { documents: [ "unbounded", "current" ] }
}
}
}
}
])
Performance Optimization Recommendations
- Use Partitioning Wisely: Partition fields should have appropriate cardinality to avoid creating too many small partitions or a single oversized partition.
- Leverage Indexes: Ensure
partitionBy
andsortBy
fields are indexed. - Limit Window Size: Avoid unbounded windows (
unbounded
) unless necessary. - Pipeline Order: Use
$match
and$project
early to reduce data volume.
// Optimized query example
db.large_collection.aggregate([
{ $match: { status: "active" } }, // Filter first
{ $project: { _id: 1, value: 1, category: 1 } }, // Reduce fields
{
$setWindowFields: {
partitionBy: "$category",
sortBy: { value: -1 },
output: {
rank: { $denseRank: {} }
}
}
}
])
Combining with Other Aggregation Stages
Window functions can be flexibly combined with other aggregation stages:
// First group and aggregate, then calculate moving average
db.sales.aggregate([
{
$group: {
_id: { product: "$product", month: { $month: "$date" } },
totalSales: { $sum: "$amount" }
}
},
{
$setWindowFields: {
partitionBy: "$_id.product",
sortBy: { "_id.month": 1 },
output: {
threeMonthAvg: {
$avg: "$totalSales",
window: { documents: [ -2, 0 ] } // Current month and previous two months
}
}
}
}
])
Real-World Business Scenarios
E-commerce Analysis
// Calculate customer purchase frequency rankings
db.orders.aggregate([
{
$group: {
_id: "$customer_id",
orderCount: { $sum: 1 },
totalSpent: { $sum: "$amount" }
}
},
{
$setWindowFields: {
sortBy: { orderCount: -1 },
output: {
frequencyRank: { $rank: {} },
top10Pct: {
$percentRank: {},
window: { documents: [ "unbounded", "unbounded" ] }
}
}
}
}
])
Financial Time Series
// Calculate Bollinger Bands for stocks
db.stock_prices.aggregate([
{
$setWindowFields: {
partitionBy: "$symbol",
sortBy: { date: 1 },
output: {
ma20: {
$avg: "$close",
window: { documents: [ -19, 0 ] }
},
std20: {
$stdDevPop: "$close",
window: { documents: [ -19, 0 ] }
},
upperBand: {
$add: [
{ $avg: "$close", window: { documents: [ -19, 0 ] } },
{ $multiply: [
2,
{ $stdDevPop: "$close", window: { documents: [ -19, 0 ] } }
]}
]
}
}
}
}
])
Limitations and Considerations
- Window functions cannot be used in
$facet
stages. - Memory limits: Large windows may require significant memory.
- Sharded collections: Partition fields should generally align with the shard key for optimal performance.
- Result size: Output documents cannot exceed the 16MB limit.
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn