Scaling out and scaling in a sharded cluster
Scaling a Sharded Cluster Up and Down
MongoDB sharded clusters address single-machine performance bottlenecks through horizontal scaling, but dynamic adjustments to cluster size are required when business volumes fluctuate. Scaling up involves adding new shards and migrating data, while scaling down requires safely removing nodes and rebalancing data distribution.
Scaling Up a Sharded Cluster
Adding Config Servers
Config servers store cluster metadata and are recommended to be deployed as replica sets. When scaling up, first expand the config server replica set members:
// Connect to the primary config server
conf = db.getSiblingDB("admin").getMongo().getDB("admin")
conf.add({_id: 3, host: "cfg3.example.net:27019"})
// Verify replica set status
conf.rs.status()
Adding a New Shard Replica Set
- Deploy and initialize a new replica set:
mongod --shardsvr --replSet shardC --dbpath /data/shardC --port 27018
- Add the replica set to the cluster:
sh.addShard("shardC/rs1.example.net:27018,rs2.example.net:27018")
Triggering Data Rebalancing
After adding a new shard, manually trigger the balancer:
sh.startBalancer()
// Check migration tasks
use config
db.chunks.find({jumbo: true}) // Inspect oversized chunks
Example of a typical data migration log:
{"t":"2023-05-17T03:22:15.234+0000","s":"I", "c":"SHARDING","id":22194,
"ctx":"Balancer","msg":"Migration succeeded","attr":{"from":"shardA","to":"shardC"}}
Scaling Down a Sharded Cluster
Migrating Data Out of the Target Shard
- First, ensure the shard is not the primary shard:
use admin
db.runCommand({removeShard: "shardB"})
// The response includes the number of chunks to migrate
- Monitor migration progress:
mongos> db.adminCommand({balancerStatus: 1})
{
"mode": "full",
"inBalancerRound": false,
"numBalancerRounds": 42,
"migrations": {
"shardB": {
"chunksInProgress": 3,
"bytesToMigrate": 157286400
}
}
}
Removing an Empty Shard
Once the shard's data volume drops to zero, complete the removal:
// Execute the removal command again
db.adminCommand({removeShard: "shardB"})
// Once the status changes to "completed," the node can be decommissioned
Handling Special Scenarios
Managing Jumbo Chunks
When chunks exceed the default 64MB size:
- Manually split large chunks:
sh.splitAt("db.collection", {_id: ObjectId("5f4d7a9e6c3f2b1a")})
- Temporarily increase the chunk size threshold:
use config
db.settings.update(
{_id: "chunksize"},
{$set: {value: 128}},
{upsert: true}
)
Maintenance Window Operations
During low-traffic periods, control migration speed:
db.adminCommand({
configureBalancer: {
mode: "limited",
maxParallelMigrations: 1,
migrationThrottlingMs: 5000
}
})
Monitoring and Validation
Key Metrics to Monitor
- Migration queue backlog:
sh.getBalancerState()
- Network traffic:
db.serverStatus().network
- Disk space changes:
db.stats(1024*1024)
// In MB
Data Consistency Checks
After migration completes, execute:
// Compare document counts across shards
db.collection.getShardDistribution()
// Use hash verification
function compareShards() {
let mapFunc = function() { emit(this._id, hash(this)); };
let reduceFunc = function(k, v) { return v[0]; };
db.collection.mapReduce(mapFunc, reduceFunc, {out: "hashes"});
}
Automation Best Practices
Using Ansible Templates for Scaling
- name: Add new shard
mongodb_shard:
login_host: mongos1
login_user: admin
login_password: "{{vault_password}}"
replica_set: shardD
members:
- host: shardD1:27018
priority: 1
- host: shardD2:27018
priority: 0.5
shardsvr: yes
Dynamic Adjustments via API
Use MongoDB management APIs for automated scaling:
async function resizeCluster(targetShard) {
const status = await sh.status();
if (status.shards.length > currentNeed) {
await sh.drainShard(targetShard);
await updateDNSRecord(targetShard, 'REMOVE');
} else {
await spinUpEC2Instance('shard-template');
await sh.addShard(newShardURI);
}
}
Performance Optimization Tips
- Pre-splitting Collections: Create empty chunks before inserting data
for (let i=0; i<100; i++) {
sh.splitFind("db.orders", {order_id: i*10000})
}
- Tag-aware Sharding: Assign tags based on data center location
sh.addTagRange("db.orders", {_id: MinKey}, {_id: MaxKey}, "EU-West")
sh.addShardTag("shardC", "EU-West")
- Index Synchronization Strategy: Ensure all shards have the same indexes
db.getSiblingDB("db").getCollectionNames().forEach(coll => {
db.getSiblingDB("config").collections.findOne({_id: `db.${coll}`}).indexes.forEach(idx => {
db.getSiblingDB("db").getCollection(coll).createIndex(idx.key, idx.options)
})
})
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn