阿里云主机折上折
  • 微信号
Current Site:Index > Monitoring metrics (CPU, memory, disk, network)

Monitoring metrics (CPU, memory, disk, network)

Author:Chuan Chen 阅读数:40416人阅读 分类: MongoDB

Monitoring Metrics (CPU, Memory, Disk, Network)

Database performance monitoring is a critical focus for operations and development teams. As a popular NoSQL database, MongoDB's operational status directly impacts business stability. By monitoring core metrics such as CPU, memory, disk, and network, potential issues can be identified early, and performance can be optimized.

CPU Usage Monitoring

MongoDB's CPU usage reflects the workload of query processing, index building, and other operations. High CPU usage may lead to increased query latency.

Key Metrics:

  • cpu_usage.user: Percentage of CPU usage in user space
  • cpu_usage.system: Percentage of CPU usage in system space
  • cpu_usage.nice: CPU usage by low-priority processes
  • globalLock.activeClients: Number of active client connections

Example Code (Node.js to Fetch CPU Metrics):

const { exec } = require('child_process');

exec('mongostat --host=localhost --rowcount=1 --noheaders', (error, stdout) => {
  const [_, usr, sys, ..._] = stdout.split(/\s+/);
  console.log(`User CPU: ${usr}%, System CPU: ${sys}%`);
});

Common Issue Scenarios:

  1. Prolonged usage above 80% may indicate the need for query optimization or additional indexes
  2. High system space usage may be caused by disk I/O waits
  3. Sudden CPU spikes are often related to complex aggregation queries

Memory Usage Analysis

MongoDB employs a memory-mapped file mechanism, resulting in memory usage patterns that differ significantly from traditional databases.

Core Memory Metrics:

  • mem.resident: Resident physical memory size (MB)
  • mem.virtual: Virtual memory usage (MB)
  • mem.mapped: Size of memory-mapped files (MB)
  • wiredTiger.cache.bytes: WiredTiger cache usage

Memory Optimization Recommendations:

  • The working set should be smaller than the configured wiredTiger.cache.size value
  • Monitor the page_faults metric to detect frequent page faults
  • For datasets larger than 4GB, configure at least 1GB of WiredTiger cache

Example (Mongo Shell to Check Memory):

db.serverStatus().mem
// Sample Output:
{
  "resident" : 1456,
  "virtual" : 3254,
  "mapped" : 1024,
  "supported" : true
}

Disk I/O Performance

Disk performance directly impacts write throughput and data persistence speed, particularly in write-intensive scenarios.

Key Disk Metrics:

  • disk.io.wait: Percentage of time spent waiting for I/O
  • backgroundFlush.average_ms: Average time taken for disk flushes (ms)
  • wiredTiger.log.syncs: Number of log sync operations
  • storage.free: Free disk space (GB)

Typical Troubleshooting:

  1. When disk.io.wait consistently exceeds 50%, consider:

    • Upgrading disks (replace HDD with SSD)
    • Adjusting journal.commitIntervalMs
    • Checking for excessive random writes
  2. Use iostat for additional diagnostics:

iostat -xm 1  # View device-level I/O statistics

Network Traffic Monitoring

Network bottlenecks can cause replication delays and client timeouts, especially in sharded cluster environments.

Core Network Metrics:

  • network.bytesIn: Inbound data volume (bytes)
  • network.bytesOut: Outbound data volume (bytes)
  • network.numRequests: Requests per second
  • repl.network.getmores: Count of fetch operations from secondary nodes

Network Optimization Example:

// Configure connection pool size (Node.js driver example)
const MongoClient = require('mongodb').MongoClient;
const url = 'mongodb://localhost:27017?poolSize=20&socketTimeoutMS=360000';

// Monitor current connections
db.serverStatus().connections
// Sample Output:
{
  "current" : 42,
  "available" : 818,
  "totalCreated" : 291
}

Metric Aggregation and Visualization

Combine Prometheus and Grafana for professional-grade monitoring:

Prometheus Configuration Example:

scrape_configs:
  - job_name: 'mongodb'
    static_configs:
      - targets: ['mongodb-exporter:9216']
    metrics_path: /metrics

Key Grafana Dashboard Charts:

  1. Combined CPU/Memory/Disk trend graph
  2. QPS statistics by operation type
  3. Replica set member latency heatmap
  4. Connection pool utilization dashboard

Alert Rule Configuration

Set reasonable threshold-based alerts based on actual business needs:

Typical Alert Rules:

  • CPU > 90% for 5 consecutive minutes
  • Memory working set exceeds 90% of available cache
  • Disk space remaining less than 20%
  • Primary-secondary replication delay > 30 seconds

MongoDB Atlas Alert Example:

{
  "eventTypeName": "OUTSIDE_METRIC_THRESHOLD",
  "metricName": "ASSERT_REGULAR",
  "operator": "GREATER_THAN",
  "threshold": 10,
  "units": "RAW",
  "notifications": [
    {
      "typeName": "SMS",
      "intervalMin": 5
    }
  ]
}

Performance Benchmarking

Establishing performance baselines helps identify abnormal changes:

Using sysbench for Testing:

sysbench --test=oltp --mongodb-db=test \
         --mongodb-collection=bench \
         --num-threads=8 --max-requests=100000 \
         run

Key Benchmark Metrics to Record:

  • 95th percentile query latency
  • Transactions per second (TPS)
  • Error rate
  • Resource utilization curves

Real-World Troubleshooting Case

Scenario: An e-commerce platform experiences slow MongoDB responses during a major sales event.

Diagnosis Steps:

  1. Discover globalLock.currentQueue.total consistently > 50
  2. Check db.currentOp() to identify slow queries:
    db.currentOp({
      "active": true,
      "secs_running": {"$gt": 3}
    })
    
  3. Use explain() to analyze and find missing order status index
  4. Performance improves 6x after adding the index:
    db.orders.createIndex({status: 1, createTime: -1})
    

Special Considerations for Containerized Environments

Monitoring differences when deploying in Kubernetes:

  1. Distinguish between container-internal and external metrics
  2. Monitor the impact of resource limits:
    resources:
      limits:
        cpu: "2"
        memory: "4Gi"
      requests:
        cpu: "1"
        memory: "2Gi"
    
  3. Use Sidecar pattern for metric collection:
    kubectl port-forward pod/mongodb-0 9216:9216
    

Historical Data Analysis

Use $out to aggregate historical monitoring data:

db.metrics.aggregate([
  {
    $match: {timestamp: {$gte: ISODate("2023-01-01")}}
  },
  {
    $group: {
      _id: {$dateToString: {format: "%Y-%m-%d", date: "$timestamp"}},
      avgCPU: {$avg: "$cpu.usage"},
      peakConn: {$max: "$connections.current"}
    }
  },
  {$out: "daily_stats"}
])

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.