Backup strategy (logical backup, physical backup)
Backup Strategy (Logical Backup, Physical Backup)
MongoDB's backup strategy mainly consists of two approaches: logical backup and physical backup. Logical backup exports data content, while physical backup directly copies underlying data files. Each method has its own advantages and disadvantages, making them suitable for different scenarios.
Logical Backup
Logical backup refers to exporting data in a logical structure format using database-provided tools, typically stored as JSON, BSON, or CSV. MongoDB provides two main tools for logical backup: mongodump
and mongoexport
.
mongodump
mongodump
is MongoDB's official backup tool that exports database content in BSON format while preserving collection index information. Basic usage:
mongodump --host localhost --port 27017 --db myDatabase --out /backup/mongodb
This command backs up the myDatabase
database to the /backup/mongodb
directory. mongodump
supports various parameters:
--collection
: Specify a particular collection to back up--query
: Back up documents matching specific conditions--gzip
: Compress output files--oplog
: Used with replica sets to capture operation logs during backup
Example: Back up order data within a specific time range
mongodump --db ecommerce --collection orders \
--query '{createdAt: {$gte: ISODate("2023-01-01"), $lt: ISODate("2023-02-01")}}' \
--out /backup/january_orders
mongoexport
The mongoexport
tool exports data as JSON or CSV format, making it suitable for interaction with other systems:
mongoexport --db myDatabase --collection users --out users.json
Key features:
- Supports
--fields
parameter to select specific fields - Can specify
--type=csv
to export as CSV format - Output files are highly readable but don't preserve index information
Example: Export user email list as CSV
mongoexport --db myApp --collection users \
--fields=email,firstName,lastName \
--type=csv --out user_emails.csv
Pros and Cons of Logical Backup
Advantages:
- High portability - backup files can be migrated between different MongoDB versions
- Selective backup of specific collections or documents
- Backup files are human-readable for easy inspection
- Storage engine independent, suitable for all MongoDB deployments
Disadvantages:
- Slower backup and restore speeds, especially with large datasets
- Continuous database writes during backup may cause data inconsistency
- Doesn't include database user and role information (requires separate admin database backup)
Physical Backup
Physical backup involves directly copying MongoDB's data files, including WiredTiger storage engine files and log files. This method is typically faster than logical backup and more suitable for large databases.
Filesystem Snapshots
Most modern filesystems support snapshot functionality to create instantaneous copies of data files with minimal performance impact:
# Create snapshot on Linux LVM
lvcreate --size 10G --snapshot --name mongo_snapshot /dev/vg0/mongo_data
Key considerations:
- Flush all pending writes before snapshot:
db.fsyncLock()
- Unlock after snapshot completion:
db.fsyncUnlock()
- Ensure sufficient space for the snapshot
Replica Set Member Backup
For replica set deployments, data files can be copied directly from secondary members:
- Remove secondary member from replica set:
rs.remove("secondary1:27017")
- Stop the mongod process
- Copy the data directory
- Rejoin the replica set
This method doesn't affect primary node performance but requires careful operation to avoid impacting replica set availability.
Cloud Service Backup
Cloud services like MongoDB Atlas provide automated physical backup functionality, typically implemented through storage volume snapshots:
// Atlas API example: Trigger on-demand backup
const fetch = require('node-fetch');
async function triggerBackup() {
const response = await fetch(
'https://cloud.mongodb.com/api/atlas/v1.0/groups/{groupId}/clusters/{clusterName}/backup/snapshots',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${API_KEY}`
},
body: JSON.stringify({
description: 'Monthly backup',
retentionInDays: 30
})
}
);
return response.json();
}
Pros and Cons of Physical Backup
Advantages:
- Faster backup and restore speeds, especially for large databases
- Maintains data consistency, suitable for critical business systems
- Includes all database metadata, including user permissions
- Minimal performance impact during backup
Disadvantages:
- Backup files are typically larger, consuming more storage space
- Dependent on MongoDB version and storage engine
- Usually requires database downtime or locking to ensure consistency
- Cross-platform restoration may present issues
Backup Strategy Selection
Consider the following factors when choosing a backup strategy:
Data Volume
- Small databases (<100GB): Logical backup is usually sufficient
- Large databases: Physical backup is more efficient
Recovery Time Objective (RTO)
- Requires fast recovery: Prioritize physical backup
- Can tolerate longer recovery times: Logical backup is feasible
Storage Limitations
- Limited storage space: Logical backup (especially compressed) saves space
- Ample storage space: Physical backup is more convenient
Typical Hybrid Strategies
- Daily physical snapshots + hourly incremental logical backups
- Primary database physical backup + secondary logical backup
- Production environment physical backup + development/test environment logical backup
Backup Verification and Recovery Testing
Regardless of the backup strategy, regularly verifying backup validity is crucial:
Logical Backup Verification
# Recovery test
mongorestore --db test_restore --drop /backup/mongodb/myDatabase
# Data validation
mongo --eval "db.users.count()" test_restore
Physical Backup Verification
- Restore data files on a new instance
- Start mongod process
- Run consistency check:
db.runCommand({validate: "orders", full: true})
Automated Verification Script Example
const { execSync } = require('child_process');
function testRestore() {
try {
// Restore backup
execSync('mongorestore --drop /backup/latest');
// Connect to database for verification
const conn = new Mongo('localhost:27017');
const db = conn.getDB('myApp');
const userCount = db.users.countDocuments();
if (userCount === 0) {
throw new Error('Restoration failed: User count is 0');
}
console.log(`Restoration verified successfully, found ${userCount} users`);
return true;
} catch (err) {
console.error('Restoration test failed:', err);
return false;
}
}
Backup Security and Storage
Both backup methods require consideration of secure storage:
-
Encrypt backups:
mongodump --db sensitiveData --out - | openssl enc -aes-256-cbc -salt -out backup.enc
-
Offsite storage:
- Copy backups to cloud storage (AWS S3, Azure Blob, etc.)
- Use rsync to synchronize with remote servers
-
Backup retention policy:
- Keep daily backups for the last 7 days
- Keep monthly backups for the last 12 months
- Permanent backups for critical time points
-
Access control:
- Restrict access to backup files
- Create dedicated database users for backup operations
db.createUser({ user: "backupAdmin", pwd: "securePassword", roles: [{role: "backup", db: "admin"}] })
Monitoring and Alerts
A robust backup system requires monitoring and alert mechanisms:
-
Monitor backup job execution status:
# Check last backup file time find /backup/mongodb -name "*.bson" -type f -mtime -1 | wc -l
-
Set up backup failure alerts:
// Monitoring script example const lastBackupTime = fs.statSync('/backup/latest').mtime; const hoursSinceBackup = (Date.now() - lastBackupTime) / (1000 * 60 * 60); if (hoursSinceBackup > 24) { sendAlert('MongoDB backup not executed for over 24 hours'); }
-
Capacity monitoring:
- Monitor backup storage space usage
- Implement automatic cleanup of old backups
Special Scenario Handling
Certain special scenarios require particular attention to backup strategy:
Sharded Cluster Backup
- Stop balancer:
sh.stopBalancer()
- Back up config servers
- Back up each shard individually
- Record shard metadata
- Restart balancer after recovery
Point-in-Time Recovery
Combine with oplog for second-level precise recovery:
mongodump --oplog --out /backup/with_oplog
mongorestore --oplogReplay /backup/with_oplog
Incremental Backup Strategy
- After initial full backup, periodically back up oplog
- During recovery, first restore full backup then replay oplog
- Use timestamps to mark backup positions:
db.oplog.rs.find({ts: {$gt: Timestamp(1672531200, 1)}})
Performance Optimization Techniques
Performance optimization for large-scale database backups:
-
Parallel backup of multiple collections:
mongodump --numParallelCollections 4 --out /backup/parallel
-
Exclude unnecessary system collections:
mongodump --excludeCollection=system.* --out /backup/essential
-
Adjust batch size:
mongorestore --batchSize=1000 /backup/data
-
Use SSD for temporary backup files:
mongodump --out /ssd/temp_backup rsync -a /ssd/temp_backup /hdd/permanent_backup
-
Network optimization:
- Perform backups within the same data center
- Use high-bandwidth network connections
- Consider compressing data during transfer
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn