Election mechanism and failover
The election mechanism and failover in MongoDB are core components of its high availability, achieved through replica sets that enable automatic failure detection and primary node switching. Nodes in a replica set use heartbeat detection and an election protocol to ensure data consistency and service continuity. When the primary node becomes unavailable, the remaining nodes trigger an election process to select a new primary node. This mechanism not only ensures system fault tolerance but also supports read-write separation and data redundancy.
Basic Structure of a Replica Set
A MongoDB replica set typically includes the following roles:
- Primary (Primary Node): Handles all write operations and synchronizes data changes to secondary nodes.
- Secondary (Secondary Node): Replicates data from the primary node and can serve read requests (if configured).
- Arbiter (Arbiter Node): Does not store data and only participates in election voting.
For example, a three-node replica set configuration might look like this:
// Initialize replica set configuration
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "mongo1:27017", priority: 2 },
{ _id: 1, host: "mongo2:27017", priority: 1 },
{ _id: 2, host: "mongo3:27017", arbiterOnly: true }
]
});
Here, nodes with higher priority
values are more likely to become primary, and the arbiter node is marked with arbiterOnly
.
Election Trigger Conditions
Elections are automatically triggered in the following scenarios:
- The primary node crashes or experiences a network partition.
- The
rs.stepDown()
command is manually executed to demote the primary node. - The replica set is initialized or its configuration changes (e.g., adjusting node priorities).
Example: Simulating a Primary Node Crash
# Connect to the primary node and force a shutdown
use admin;
db.shutdownServer();
Detailed Election Process
1. Heartbeat Detection Failure
Each node sends a heartbeat to other nodes every 2 seconds. If a secondary node does not receive a response from the primary node for 10 seconds, it initiates an election.
2. Candidate Node Eligibility Check
Nodes participating in the election must meet the following criteria:
- Maintain connections with a majority of nodes.
- Have the most up-to-date data (highest optime).
- Have a non-zero priority.
3. Voting and Majority Decision
The replica set uses a variant of the Raft protocol, where nodes allocate voting rights via the votes
property. A successful election requires a majority vote (e.g., at least 2 votes in a 3-node set).
Example of Voting Weights:
// View node voting weights
rs.status().members.forEach(member => {
console.log(`Node ${member._id}: Votes = ${member.votes}`);
});
Typical Failover Scenarios
Scenario 1: Physical Failure of the Primary Node
- Remaining nodes detect the primary node's unavailability.
- The highest-priority secondary node initiates an election.
- The new primary node takes over write requests, and the application layer may briefly encounter errors (requiring retry mechanisms).
Frontend Retry Logic Example:
async function writeWithRetry(operation, maxRetries = 3) {
let attempts = 0;
while (attempts < maxRetries) {
try {
return await operation();
} catch (err) {
if (err.code === 10107) { // NotMaster error code
attempts++;
await new Promise(resolve => setTimeout(resolve, 1000));
} else throw err;
}
}
throw new Error("Max retries reached");
}
Scenario 2: Data Center Network Partition
- Both sides of the partition may elect new primary nodes (split-brain).
- MongoDB avoids this issue through the "majority principle": only nodes receiving a majority of votes can become primary.
- The old primary node will automatically demote itself if it detects it cannot contact a majority of nodes.
Configuration Optimization Practices
Adjusting Election Timeout
Modify electionTimeoutMillis
(default: 10 seconds) to balance failure detection speed and false-positive probability:
# mongod.conf
replication:
electionTimeoutMillis: 5000
Hidden and Delayed Nodes
- Hidden Node: Does not participate in elections, used for dedicated backups.
rs.reconfig({ _id: "rs0", members: [ { _id: 0, host: "mongo1:27017", priority: 2 }, { _id: 1, host: "mongo2:27017", priority: 1, hidden: true } ] });
- Delayed Node: Intentionally lags in synchronization for data rollback purposes.
{ _id: 2, host: "mongo3:27017", priority: 0, slaveDelay: 3600 }
Monitoring and Diagnostics
Key metrics to monitor:
- Replication Lag:
rs.status().members[n].lag
- Election Count:
db.serverStatus().repl.electionMetrics
- Node Status:
rs.isMaster()
Diagnostic Command Examples:
# View election history
db.adminCommand({ replSetGetStatus: 1 }).electionCandidateMetrics
# Force reconfiguration (use with caution)
rs.reconfig(newCfg, { force: true })
Client Connection Handling
Drivers should be configured with multiple node addresses for automatic failover:
// Node.js driver connection string
const uri = "mongodb://mongo1:27017,mongo2:27017,mongo3:27017/?replicaSet=rs0";
const client = new MongoClient(uri, {
retryWrites: true,
retryReads: true,
serverSelectionTimeoutMS: 5000
});
Performance Impact of Elections
- Write Operation Pauses: All write operations are blocked during elections (typically 1-2 seconds).
- Read Scalability: Secondary nodes configured with
readPreference: secondary
can share read loads. - Index Consistency: Ensure all nodes have the same indexes to avoid performance degradation after elections.
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn