Sharding strategies (range sharding, hash sharding, zone sharding)

Author：Chuan Chen 阅读数：18238人阅读分类： MongoDB

Sharding Strategy Overview

MongoDB's sharding strategy determines how data is distributed across a sharded cluster. Choosing an appropriate sharding strategy is crucial for query performance and cluster scalability. The three main sharding strategies each have distinct characteristics and are suited for different scenarios.

Range-Based Sharding

Range-based sharding allocates data to different shards based on value ranges of the shard key. MongoDB divides the shard key's value space into multiple contiguous ranges, with each range assigned to a specific shard.

How It Works

The system maintains a configuration server that records the value ranges each shard is responsible for.
When inserting a document, the system determines the appropriate range based on the shard key value.
During queries, the routing service directs requests to specific shards based on query conditions.

Use Cases

Scenarios with frequent range queries
Data with a natural order in the shard key (e.g., time-series data)
Applications requiring data access in a specific order

Example Code

// Enable range-based sharding
sh.shardCollection("orders.orders", { orderDate: 1 })

// Documents will be allocated to different shards based on orderDate values
db.orders.insertMany([
  { orderId: 1, orderDate: new Date("2023-01-01"), amount: 100 },
  { orderId: 2, orderDate: new Date("2023-02-01"), amount: 200 },
  { orderId: 3, orderDate: new Date("2023-03-01"), amount: 300 }
])

Pros and Cons

Pros:

Efficient range queries, as only relevant shards need to be accessed
Predictable data distribution, making management easier

Cons:

May lead to uneven data distribution (hotspot issues)
Manual adjustment of range boundaries required when adding new shards

Hash-Based Sharding

Hash-based sharding allocates data by computing hash values of the shard key. This approach distributes data randomly across shards.

How It Works

Apply a hash function to the shard key value to generate a hash value.
Allocate data to different shards based on hash value ranges.
For exact-match queries, compute the hash value to determine the shard location.

Use Cases

Write-heavy workloads requiring even distribution
Scenarios with highly random shard key values
Exact-match query scenarios without range query requirements

Example Code

// Enable hash-based sharding
sh.shardCollection("users.profiles", { userId: "hashed" })

// Documents will be allocated to different shards based on userId hash values
db.profiles.insertMany([
  { userId: "user1", name: "Alice", age: 25 },
  { userId: "user2", name: "Bob", age: 30 },
  { userId: "user3", name: "Charlie", age: 35 }
])

Pros and Cons

Pros:

Even data distribution, avoiding hotspots
Good scalability, with automatic data rebalancing when adding new shards

Cons:

Inefficient range queries, as all shards may need to be queried
Does not support compound hashed shard keys

Zone-Based Sharding

Zone-based sharding groups data into different zones based on specific rules, with each zone containing multiple shards. This is an extension of range-based sharding, adding geographic or logical grouping dimensions.

How It Works

Define zones and shard key ranges.
Assign shards to specific zones.
Allocate data to shards in corresponding zones based on shard key values.
Set zone priorities to control data migration.

Use Cases

Scenarios requiring geographic data distribution
Data with clear business groupings
Applications needing to comply with data sovereignty requirements

Example Code

// Create zones
sh.addShardTag("shard0000", "US")
sh.addShardTag("shard0001", "EU")

// Define zone ranges
sh.addTagRange("orders.orders", { region: "US" }, { region: "US" }, "US")
sh.addTagRange("orders.orders", { region: "EU" }, { region: "EU" }, "EU")

// Documents will be allocated to shards in corresponding zones based on region values
db.orders.insertMany([
  { orderId: 1, region: "US", amount: 100 },
  { orderId: 2, region: "EU", amount: 200 },
  { orderId: 3, region: "US", amount: 300 }
])

Pros and Cons

Pros:

Fine-grained control over data location
Supports multi-level data distribution strategies
Optimizes geographically proximate queries

Cons:

Complex configuration requiring upfront planning
May lead to uneven load across zones

Factors to Consider When Choosing a Sharding Strategy

Selecting a sharding strategy requires considering multiple factors:

Query Patterns: Frequent range queries favor range-based sharding, while exact-match queries favor hash-based sharding.
Write Patterns: High write throughput scenarios favor hash-based sharding.
Data Growth Patterns: Time-series data favors range-based sharding.
Hardware Configuration: Different shards may have varying hardware configurations.
Compliance Requirements: Data sovereignty needs may require zone-based sharding.

Compound Shard Key Strategy

For complex scenarios, compound shard keys can combine the advantages of different strategies:

// Combining range-based and hash-based sharding benefits
sh.shardCollection("logs.entries", { date: 1, userId: "hashed" })

This combination:

Distributes data by date ranges to optimize time-based queries
Uses hashing by userID within the same date to avoid hotspots

Monitoring and Adjusting Sharding Strategies

After implementing a sharding strategy, continuous monitoring is essential:

Use db.collection.getShardDistribution() to view data distribution.
Monitor load across shards.
Observe query performance metrics.
Adjust shard keys or rebalance data as needed.

// View shard distribution
db.orders.getShardDistribution()

// Manually trigger data balancing
sh.startBalancer()

Sharding Strategy and Index Design

Sharding strategy is closely related to index design:

The shard key automatically becomes an index.
Query performance depends on whether the shard key is used.
The order of compound shard keys affects query efficiency.
Secondary indexes can be local (existing only on shards) or global.

// Create an index on a sharded collection
db.orders.createIndex({ customerId: 1 })

// View index information
db.orders.getIndexes()

做个网站！

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱：cc@cccx.cn

上一篇：分片键（Shard Key）选择策略

下一篇：配置服务器（Config Server）与查询路由（mongos）

Sharding strategies (range sharding, hash sharding, zone sharding)

Sharding Strategy Overview

Range-Based Sharding

How It Works

Use Cases

Example Code

Pros and Cons

Hash-Based Sharding

How It Works

Use Cases

Example Code

Pros and Cons

Zone-Based Sharding

How It Works

Use Cases

Example Code

Pros and Cons

Factors to Consider When Choosing a Sharding Strategy

Compound Shard Key Strategy

Monitoring and Adjusting Sharding Strategies

Sharding Strategy and Index Design

Front End Chuan

相关文章

表达式与运算符（算术、比较、日期、字符串等）

数据分片与分区策略

分片与复制集的结合使用

新版本特性展望

中间件（Middleware）的应用

文档的删除（deleteOne、deleteMany）