Document structure design principles

Author：Chuan Chen 阅读数：10626人阅读分类： MongoDB

Document Structure Design Principles

MongoDB, as a document-oriented database, excels in flexible document structure design. Well-designed document structures can significantly improve query performance, simplify application logic, and reduce maintenance costs. Unlike traditional relational databases, MongoDB encourages data relationships through nesting and arrays, but excessive nesting may also lead to performance issues.

Choosing Between Embedding and Referencing

Deciding whether to embed or reference data is the primary consideration in structure design. Embedding is suitable for subdocuments that are frequently queried together and updated infrequently. For example, user address information is typically embedded in user documents:

{
  _id: "user123",
  name: "Zhang San",
  addresses: [
    {
      type: "home",
      street: "100 Renmin Road",
      city: "Beijing"
    },
    {
      type: "work",
      street: "200 Kejiyuan Road",
      city: "Shenzhen"
    }
  ]
}

Referencing is more appropriate for many-to-many relationships or scenarios requiring frequent independent updates. For instance, comments in a blog system can use references:

// Post document
{
  _id: "post1",
  title: "MongoDB Design Patterns",
  commentIds: ["comment1", "comment2"]
}

// Comment document
{
  _id: "comment1",
  content: "Very helpful",
  userId: "user123"
}

Pre-Aggregating Data

In scenarios requiring frequent calculations, pre-storing computed results can greatly enhance query performance. An e-commerce product document might include sales statistics:

{
  _id: "product001",
  name: "Wireless Earbuds",
  price: 299,
  stats: {
    totalSales: 1542,
    monthlySales: 213,
    averageRating: 4.7
  }
}

This design avoids real-time calculations during each query, making it particularly suitable for dashboard applications. However, it's important to maintain these precomputed fields when data is updated.

Bucket Pattern for Time Series Data

For time-series information like sensor data, bucket storage can effectively control document volume. Storing hourly documents instead of per-minute records:

{
  sensorId: "temp001",
  date: ISODate("2023-05-20T00:00:00Z"),
  readings: [
    { time: 0, value: 23.4 },
    { time: 15, value: 23.7 },
    // ...other minute data
  ],
  stats: {
    max: 24.1,
    min: 23.2,
    avg: 23.6
  }
}

This pattern reduces potential 1,440 documents to just 24, significantly improving query efficiency.

Handling Many-to-Many Relationships

When implementing many-to-many relationships, consider query patterns to determine storage methods. Two design approaches for a student-course system:

// Option 1: Courses store student references
{
  _id: "course101",
  name: "Database Principles",
  studentIds: ["stu1", "stu2"]
}

// Option 2: Students store course references
{
  _id: "stu1",
  name: "Li Si",
  courseIds: ["course101", "course102"]
}

In practice, bidirectional references are often used, with application-layer data consistency maintenance:

// Course document
{
  _id: "course101",
  name: "Database Principles",
  students: [
    { id: "stu1", name: "Li Si" },
    { id: "stu2", name: "Wang Wu" }
  ]
}

// Student document
{
  _id: "stu1",
  name: "Li Si",
  courses: [
    { id: "course101", name: "Database Principles" }
  ]
}

Schema Versioning

As applications evolve, document structures may need changes. Implement smooth transitions using version fields:

{
  _id: "user456",
  schemaVersion: 2,
  basicInfo: {
    firstName: "Wang",
    lastName: "Xiaoming"
  },
  // V2-added authentication info
  auth: {
    lastLogin: ISODate("2023-05-20T08:30:00Z"),
    loginCount: 42
  }
}

Application code processes documents based on schemaVersion, allowing old and new data to coexist until migration completes.

Read-Write Ratio Considerations

Document structures should optimize for read-write ratios. Highly normalized designs suit high-read, low-update scenarios:

// Product detail page document
{
  _id: "product789",
  name: "Smart Watch",
  fullDescription: "...detailed HTML content...",
  specs: {
    // All specification parameters
  },
  reviews: [
    // Latest 20 reviews
  ]
}

For high-update scenarios, separate volatile data:

// Main document
{
  _id: "article123",
  title: "MongoDB Best Practices",
  content: "...",
  staticData: {...}
}

// Independent counter document
{
  _id: "article123_stats",
  views: 12456,
  shares: 342,
  lastUpdated: ISODate("2023-05-20T10:00:00Z")
}

Index Design Coordination

Document structures must align with indexing strategies. Example for location-based queries:

{
  _id: "place001",
  name: "Central Park",
  location: {
    type: "Point",
    coordinates: [ -73.97, 40.78 ]
  },
  // Optimized for category queries
  tags: ["park", "landmark", "tourist"],
  // Optimized for range queries
  visitorStats: {
    lastMonth: 15000,
    lastWeek: 4200
  }
}

Corresponding indexes should include geospatial, tags, and visitor metrics:

db.places.createIndex({ "location": "2dsphere" })
db.places.createIndex({ "tags": 1 })
db.places.createIndex({ "visitorStats.lastWeek": -1 })

Document Size Limitations

MongoDB documents cannot exceed 16MB. For large content, use chunked storage:

// Document metadata
{
  _id: "doc_abc",
  title: "User Manual",
  chunkSize: 102400,
  totalChunks: 15,
  currentVersion: 3
}

// Content chunks
{
  docId: "doc_abc",
  chunkNum: 1,
  data: BinData(0, "...base64-encoded data...")
}

This pattern is particularly suitable for storing file contents or large text fields.

Supporting Atomic Operations

Fields requiring atomic updates should reside in the same document. Shopping cart example:

{
  _id: "cart_user123",
  items: [
    {
      productId: "prod1",
      quantity: 2,
      price: 99,
      addedAt: ISODate("2023-05-20T09:15:00Z")
    }
  ],
  summary: {
    itemCount: 1,
    total: 198
  }
}

Atomic updates using $inc operator:

db.carts.updateOne(
  { _id: "cart_user123" },
  {
    $inc: {
      "summary.itemCount": 1,
      "summary.total": 99
    },
    $push: {
      items: {
        productId: "prod2",
        quantity: 1,
        price: 199
      }
    }
  }
)

Adapting to Application Scenarios

Different business scenarios require different structural optimization strategies. Social media user relationship design:

// Basic user document
{
  _id: "user789",
  username: "tech_enthusiast",
  profile: {
    displayName: "Technology Enthusiast",
    avatar: "url/to/avatar.jpg"
  },
  // Pre-aggregated follower/following counts
  counts: {
    followers: 5423,
    following: 123
  }
}

// Relationship document (paginated following list)
{
  _id: "user789_following_1",
  userId: "user789",
  page: 1,
  following: [
    { userId: "user123", since: ISODate("2023-01-15") },
    // ...other followed users
  ]
}

This hybrid design balances read performance with write efficiency.

做个网站！

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱：cc@cccx.cn

上一篇：MapReduce（基本概念与使用场景）

下一篇：引用式关联