阿里云主机折上折
  • 微信号
Current Site:Index > The window function ($setWindowFields)

The window function ($setWindowFields)

Author:Chuan Chen 阅读数:24603人阅读 分类: MongoDB

Window Functions ($setWindowFields)

Window functions are powerful aggregation operators introduced in MongoDB 5.0, allowing access to adjacent document data when performing calculations on document collections. Unlike traditional aggregation operations, window functions can partition and sort data based on defined window frames and then perform calculations within each window.

Basic Syntax Structure

The basic syntax of $setWindowFields is as follows:

{
  $setWindowFields: {
    partitionBy: <expression>,  // Partition field
    sortBy: { <field1>: <sortOrder>, ... },  // Sorting rules
    output: {  // Output field definitions
      <outputField1>: {
        <windowOperator>: <expression>,
        window: {
          documents: [ <lowerBound>, <upperBound> ],  // Document window
          range: [ <lowerBound>, <upperBound> ],  // Value range window
          unit: <timeUnit>  // Time unit
        }
      },
      ...
    }
  }
}

Partitioning and Sorting

Partitioning (partitionBy) divides data into multiple groups, with window functions performing independent calculations within each partition. Sorting (sortBy) determines the processing order of documents within a partition:

// Partition by department and sort by salary in descending order
db.employees.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$department",
      sortBy: { salary: -1 },
      output: {
        rank: {
          $rank: {},
          window: { documents: [ "unbounded", "current" ] }
        }
      }
    }
  }
])

Window Range Definitions

Window ranges determine which documents are included in calculations:

  1. Document Window: Based on document position

    • ["unbounded", "unbounded"]: The entire partition
    • ["unbounded", "current"]: From the start of the partition to the current row
    • [ -1, 1 ]: One row before and after the current row
  2. Value Range Window: Based on field values

    • [ -100, 100 ]: Documents within ±100 of the current value
// Calculate the average sales amount over a 3-month sliding window
db.sales.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$product",
      sortBy: { date: 1 },
      output: {
        threeMonthAvg: {
          $avg: "$amount",
          window: {
            range: [ -3, 0 ],
            unit: "month"
          }
        }
      }
    }
  }
])

Common Window Functions

Ranking Functions

  1. $rank: Standard ranking (skips subsequent ranks for ties)
  2. $denseRank: Dense ranking (does not skip ranks for ties)
  3. $rowNumber: Simple row number
// Calculate salary rankings within departments
db.employees.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$department",
      sortBy: { salary: -1 },
      output: {
        rank: { $rank: {} },
        denseRank: { $denseRank: {} },
        rowNumber: { $rowNumber: {} }
      }
    }
  }
])

Aggregation Functions

Standard aggregation functions like $sum, $avg, $min, $max, etc., can all be used as window functions:

// Calculate cumulative sales for each product
db.sales.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$product",
      sortBy: { date: 1 },
      output: {
        cumulativeSales: {
          $sum: "$amount",
          window: { documents: [ "unbounded", "current" ] }
        }
      }
    }
  }
])

Offset Functions

  1. $shift: Access rows relative to the current row
  2. $first/$last: Get the first/last value within the window
// Calculate month-over-month sales growth rate
db.monthly_sales.aggregate([
  {
    $setWindowFields: {
      sortBy: { yearMonth: 1 },
      output: {
        prevMonthSales: {
          $shift: {
            output: "$amount",
            by: -1,
            default: null
          }
        },
        growthRate: {
          $divide: [
            { $subtract: [ "$amount", { $shift: { output: "$amount", by: -1 } } ] },
            { $shift: { output: "$amount", by: -1 } }
          ]
        }
      }
    }
  }
])

Advanced Application Examples

Moving Average Calculation

// Calculate 7-day moving average temperature
db.temperature_readings.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$sensor_id",
      sortBy: { timestamp: 1 },
      output: {
        sevenDayAvg: {
          $avg: "$temperature",
          window: {
            range: [ -6, 0 ],  // Current day + previous 6 days
            unit: "day"
          }
        }
      }
    }
  }
])

Session Segmentation

// Segment user sessions based on 30 minutes of inactivity
db.user_clicks.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$user_id",
      sortBy: { timestamp: 1 },
      output: {
        newSession: {
          $function: {
            body: function(timestamp, prevTimestamp) {
              return (!prevTimestamp || 
                     (timestamp - prevTimestamp) > 30*60*1000) ? 1 : 0;
            },
            args: ["$timestamp", { $shift: { output: "$timestamp", by: -1 } }],
            lang: "js"
          }
        },
        sessionId: {
          $sum: {
            $function: {
              body: function(timestamp, prevTimestamp) {
                return (!prevTimestamp || 
                       (timestamp - prevTimestamp) > 30*60*1000) ? 1 : 0;
              },
              args: ["$timestamp", { $shift: { output: "$timestamp", by: -1 } }],
              lang: "js"
            }
          },
          window: { documents: [ "unbounded", "current" ] }
        }
      }
    }
  }
])

Performance Optimization Recommendations

  1. Use Partitioning Wisely: Partition fields should have appropriate cardinality to avoid creating too many small partitions or a single oversized partition.
  2. Leverage Indexes: Ensure partitionBy and sortBy fields are indexed.
  3. Limit Window Size: Avoid unbounded windows (unbounded) unless necessary.
  4. Pipeline Order: Use $match and $project early to reduce data volume.
// Optimized query example
db.large_collection.aggregate([
  { $match: { status: "active" } },  // Filter first
  { $project: { _id: 1, value: 1, category: 1 } },  // Reduce fields
  {
    $setWindowFields: {
      partitionBy: "$category",
      sortBy: { value: -1 },
      output: {
        rank: { $denseRank: {} }
      }
    }
  }
])

Combining with Other Aggregation Stages

Window functions can be flexibly combined with other aggregation stages:

// First group and aggregate, then calculate moving average
db.sales.aggregate([
  {
    $group: {
      _id: { product: "$product", month: { $month: "$date" } },
      totalSales: { $sum: "$amount" }
    }
  },
  {
    $setWindowFields: {
      partitionBy: "$_id.product",
      sortBy: { "_id.month": 1 },
      output: {
        threeMonthAvg: {
          $avg: "$totalSales",
          window: { documents: [ -2, 0 ] }  // Current month and previous two months
        }
      }
    }
  }
])

Real-World Business Scenarios

E-commerce Analysis

// Calculate customer purchase frequency rankings
db.orders.aggregate([
  {
    $group: {
      _id: "$customer_id",
      orderCount: { $sum: 1 },
      totalSpent: { $sum: "$amount" }
    }
  },
  {
    $setWindowFields: {
      sortBy: { orderCount: -1 },
      output: {
        frequencyRank: { $rank: {} },
        top10Pct: {
          $percentRank: {},
          window: { documents: [ "unbounded", "unbounded" ] }
        }
      }
    }
  }
])

Financial Time Series

// Calculate Bollinger Bands for stocks
db.stock_prices.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$symbol",
      sortBy: { date: 1 },
      output: {
        ma20: {
          $avg: "$close",
          window: { documents: [ -19, 0 ] }
        },
        std20: {
          $stdDevPop: "$close",
          window: { documents: [ -19, 0 ] }
        },
        upperBand: {
          $add: [
            { $avg: "$close", window: { documents: [ -19, 0 ] } },
            { $multiply: [
              2,
              { $stdDevPop: "$close", window: { documents: [ -19, 0 ] } }
            ]}
          ]
        }
      }
    }
  }
])

Limitations and Considerations

  1. Window functions cannot be used in $facet stages.
  2. Memory limits: Large windows may require significant memory.
  3. Sharded collections: Partition fields should generally align with the shard key for optimal performance.
  4. Result size: Output documents cannot exceed the 16MB limit.

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.