Election mechanism and failover

Author：Chuan Chen 阅读数：1158人阅读分类： MongoDB

The election mechanism and failover in MongoDB are core components of its high availability, achieved through replica sets that enable automatic failure detection and primary node switching. Nodes in a replica set use heartbeat detection and an election protocol to ensure data consistency and service continuity. When the primary node becomes unavailable, the remaining nodes trigger an election process to select a new primary node. This mechanism not only ensures system fault tolerance but also supports read-write separation and data redundancy.

Basic Structure of a Replica Set

A MongoDB replica set typically includes the following roles:

Primary (Primary Node): Handles all write operations and synchronizes data changes to secondary nodes.
Secondary (Secondary Node): Replicates data from the primary node and can serve read requests (if configured).
Arbiter (Arbiter Node): Does not store data and only participates in election voting.

For example, a three-node replica set configuration might look like this:

// Initialize replica set configuration
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1 },
    { _id: 2, host: "mongo3:27017", arbiterOnly: true }
  ]
});

Here, nodes with higher priority values are more likely to become primary, and the arbiter node is marked with arbiterOnly.

Election Trigger Conditions

Elections are automatically triggered in the following scenarios:

The primary node crashes or experiences a network partition.
The rs.stepDown() command is manually executed to demote the primary node.
The replica set is initialized or its configuration changes (e.g., adjusting node priorities).

Example: Simulating a Primary Node Crash

# Connect to the primary node and force a shutdown
use admin;
db.shutdownServer();

Detailed Election Process

1. Heartbeat Detection Failure

Each node sends a heartbeat to other nodes every 2 seconds. If a secondary node does not receive a response from the primary node for 10 seconds, it initiates an election.

2. Candidate Node Eligibility Check

Nodes participating in the election must meet the following criteria:

Maintain connections with a majority of nodes.
Have the most up-to-date data (highest optime).
Have a non-zero priority.

3. Voting and Majority Decision

The replica set uses a variant of the Raft protocol, where nodes allocate voting rights via the votes property. A successful election requires a majority vote (e.g., at least 2 votes in a 3-node set).

Example of Voting Weights:

// View node voting weights
rs.status().members.forEach(member => {
  console.log(`Node ${member._id}: Votes = ${member.votes}`);
});

Typical Failover Scenarios

Scenario 1: Physical Failure of the Primary Node

Remaining nodes detect the primary node's unavailability.
The highest-priority secondary node initiates an election.
The new primary node takes over write requests, and the application layer may briefly encounter errors (requiring retry mechanisms).

Frontend Retry Logic Example:

async function writeWithRetry(operation, maxRetries = 3) {
  let attempts = 0;
  while (attempts < maxRetries) {
    try {
      return await operation();
    } catch (err) {
      if (err.code === 10107) { // NotMaster error code
        attempts++;
        await new Promise(resolve => setTimeout(resolve, 1000));
      } else throw err;
    }
  }
  throw new Error("Max retries reached");
}

Scenario 2: Data Center Network Partition

Both sides of the partition may elect new primary nodes (split-brain).
MongoDB avoids this issue through the "majority principle": only nodes receiving a majority of votes can become primary.
The old primary node will automatically demote itself if it detects it cannot contact a majority of nodes.

Configuration Optimization Practices

Adjusting Election Timeout

Modify electionTimeoutMillis (default: 10 seconds) to balance failure detection speed and false-positive probability:

# mongod.conf
replication:
  electionTimeoutMillis: 5000

Hidden and Delayed Nodes

Hidden Node: Does not participate in elections, used for dedicated backups.

rs.reconfig({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1, hidden: true }
  ]
});

Delayed Node: Intentionally lags in synchronization for data rollback purposes.
```
{ _id: 2, host: "mongo3:27017", priority: 0, slaveDelay: 3600 }
```

Monitoring and Diagnostics

Key metrics to monitor:

Replication Lag: rs.status().members[n].lag
Election Count: db.serverStatus().repl.electionMetrics
Node Status: rs.isMaster()

Diagnostic Command Examples:

# View election history
db.adminCommand({ replSetGetStatus: 1 }).electionCandidateMetrics

# Force reconfiguration (use with caution)
rs.reconfig(newCfg, { force: true })

Client Connection Handling

Drivers should be configured with multiple node addresses for automatic failover:

// Node.js driver connection string
const uri = "mongodb://mongo1:27017,mongo2:27017,mongo3:27017/?replicaSet=rs0";
const client = new MongoClient(uri, {
  retryWrites: true,
  retryReads: true,
  serverSelectionTimeoutMS: 5000
});

Performance Impact of Elections

Write Operation Pauses: All write operations are blocked during elections (typically 1-2 seconds).
Read Scalability: Secondary nodes configured with readPreference: secondary can share read loads.
Index Consistency: Ensure all nodes have the same indexes to avoid performance degradation after elections.

做个网站！

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱：cc@cccx.cn

上一篇：主节点（Primary）与从节点（Secondary）

下一篇：数据同步（Oplog）与延迟节点