The data model of MongoDB (documents, collections, databases)

Author：Chuan Chen 阅读数：12691人阅读分类： MongoDB

MongoDB Data Model (Documents, Collections, Databases)

As a representative NoSQL database, MongoDB adopts a flexible document data model, with core concepts including documents, collections, and databases. This design enables MongoDB to efficiently handle unstructured or semi-structured data while maintaining powerful query capabilities.

Document

A document is the most basic data unit in MongoDB, stored in BSON (Binary JSON) format. Each document consists of key-value pairs, similar to a JSON object but supporting richer data types.

// A typical MongoDB document example
{
  _id: ObjectId("507f1f77bcf86cd799439011"),
  name: "Zhang San",
  age: 28,
  address: {
    city: "Beijing",
    district: "Haidian District"
  },
  hobbies: ["programming", "swimming", "photography"],
  createdAt: new Date("2023-01-15")
}

Document characteristics:

The _id field is the unique identifier for the document; if not specified, it is automatically generated
Supports nested documents (e.g., the address field)
Can store arrays (e.g., the hobbies field)
Field values can be of various data types (string, number, date, etc.)

The document size limit is 16MB, suitable for storing most business data. For larger files, MongoDB provides the GridFS specification.

Collection

A collection is a container for a group of documents, similar to a table in relational databases but without a fixed schema.

// Example of different document structures in a user collection
[
  {
    _id: 1,
    username: "user1",
    email: "user1@example.com"
  },
  {
    _id: 2,
    username: "user2",
    profile: {
      firstName: "Li",
      lastName: "Si"
    },
    lastLogin: ISODate("2023-05-20T08:30:00Z")
  }
]

Key features of collections:

Dynamic schema: Documents in a collection can have different structures
Naming rules: Cannot contain special characters like null characters or the $ symbol
Index support: Indexes can be created on collections to improve query performance
Capped collections: Fixed-size collections, suitable for scenarios like logging

Collection operation examples (using MongoDB Shell):

// Create a collection
db.createCollection("users")

// View all collections
show collections

// Delete a collection
db.users.drop()

Database

A database is the top-level organizational structure in MongoDB, containing multiple collections. A single MongoDB instance can host multiple databases.

Key characteristics of databases:

Namespace isolation: Collections in different databases can have the same name
Independent access control: Different access permissions can be set for each database
Storage quotas: Storage limits can be set for databases

Common database commands:

// Switch to/create a database
use mydb

// View the current database
db

// View all databases
show dbs

// Delete the current database
db.dropDatabase()

Data Model Design Patterns

MongoDB provides various data model design methods, including:

Embedded document pattern:

// Order with embedded order items
{
  _id: "ORD123",
  customer: "Zhang San",
  items: [
    { product: "phone", quantity: 1, price: 5999 },
    { product: "earphones", quantity: 2, price: 299 }
  ],
  total: 6597
}

Reference pattern:

// User collection
{
  _id: "USER1",
  name: "Li Si"
}

// Order collection
{
  _id: "ORDER1",
  user: "USER1",  // Reference to user ID
  amount: 1000
}

Hybrid pattern (embedded + reference):

// Blog post with comments
{
  _id: "POST1",
  title: "MongoDB Guide",
  content: "...",
  recentComments: [  // Embedded recent comments
    { user: "USER1", text: "Great article", date: ISODate(...) },
    { user: "USER2", text: "Very helpful", date: ISODate(...) }
  ],
  commentCount: 15  // Total comment count
}

Data Types

MongoDB supports a wide range of data types, including:

Basic types: String, Integer, Boolean, Double
Date types: Date, Timestamp
Object ID: ObjectId
Binary data: BinData
Special types: Null, Regular Expression, JavaScript code
Geospatial types: Point, LineString, Polygon

// Document containing various data types
{
  _id: ObjectId("5f8d8a7b2f4a1e3d6c9b8a7d"),
  name: "Sample Document",
  count: 42,
  active: true,
  price: 19.99,
  tags: ["mongodb", "database", "nosql"],
  createdAt: new Date(),
  location: {
    type: "Point",
    coordinates: [116.404, 39.915]
  },
  metadata: {
    version: "1.0",
    hash: BinData(0, "aGVsbG8gd29ybGQ=")
  }
}

Indexes and Data Model

Indexes significantly impact data model design. MongoDB supports various index types:

// Create a single-field index
db.users.createIndex({ username: 1 })

// Create a compound index
db.orders.createIndex({ customer: 1, date: -1 })

// Create a multikey index (for array fields)
db.products.createIndex({ tags: 1 })

// Create a text index
db.articles.createIndex({ content: "text" })

// Create a geospatial index
db.places.createIndex({ location: "2dsphere" })

Index design should consider query patterns to avoid excessive indexing that could degrade write performance.

Data Model Best Practices

Design document structure based on query patterns:

// Read-optimized design
{
  _id: "PROD001",
  name: "Smartphone",
  price: 2999,
  inventory: {
    warehouse1: 50,
    warehouse2: 30
  },
  dailySales: [
    { date: ISODate("2023-05-01"), count: 12 },
    { date: ISODate("2023-05-02"), count: 8 }
  ]
}

Handle one-to-many relationships:

// Few child documents: embed
{
  orderId: "ORD1001",
  items: [
    { product: "A", qty: 2 },
    { product: "B", qty: 1 }
  ]
}

// Many child documents: reference
{
  blogPostId: "POST100",
  title: "...",
  commentCount: 142,
  recentComments: [...]
}

Consider document growth:

// Pre-allocate space to avoid document movement
{
  _id: "USER100",
  name: "Wang Wu",
  activityLog: new Array(100).fill(null)
}

Data model considerations in sharded clusters:

// Choose an appropriate shard key
sh.shardCollection("mydb.orders", { customerId: 1, orderDate: 1 })

Data Model Evolution

MongoDB's schemaless design allows flexible data model evolution:

// Add new fields
db.products.updateMany(
  {},
  { $set: { lastUpdated: new Date() } }
)

// Rename fields
db.users.updateMany(
  {},
  { $rename: { "oldField": "newField" } }
)

// Migrate data formats
db.orders.aggregate([
  { $project: {
    customerId: 1,
    items: 1,
    total: { $sum: "$items.price" }
  }},
  { $out: "orders_new" }
])

Performance Considerations

Data model design directly impacts performance:

Working set size: Ensure active data fits in memory
Document size: Avoid overly large documents (close to the 16MB limit)
Index coverage: Design queries that can be fully covered by indexes
Write patterns: Consider batch insert and update efficiency

// Batch insert optimization
db.products.insertMany([
  { name: "Product1", price: 10 },
  { name: "Product2", price: 20 },
  // ...more documents
])

// Batch update optimization
db.orders.updateMany(
  { status: "pending" },
  { $set: { status: "processed" } }
)

Practical Application Example

E-commerce platform data model design:

// Product collection
{
  _id: "P1001",
  sku: "MBP-13-2023",
  name: "MacBook Pro 13-inch 2023",
  category: ["Electronics", "Laptops"],
  price: 12999,
  attributes: {
    cpu: "M2",
    ram: "16GB",
    storage: "512GB SSD"
  },
  inventory: 50,
  ratings: [
    { userId: "U1001", score: 5, comment: "Very satisfied" },
    { userId: "U1002", score: 4 }
  ],
  avgRating: 4.5
}

// Order collection
{
  _id: "O2001",
  customerId: "U1001",
  items: [
    { productId: "P1001", quantity: 1, price: 12999 },
    { productId: "P1002", quantity: 2, price: 199 }
  ],
  shipping: {
    address: "Chaoyang District, Beijing...",
    method: "express"
  },
  payment: {
    method: "credit_card",
    transactionId: "TXN123456"
  },
  status: "completed",
  createdAt: ISODate("2023-05-15T10:30:00Z"),
  updatedAt: ISODate("2023-05-18T14:15:00Z")
}

Advanced Data Model Techniques

Polymorphic pattern:

// Different types of event documents
[
  {
    _id: "EVT001",
    type: "login",
    userId: "U1001",
    timestamp: ISODate(...),
    ipAddress: "192.168.1.1"
  },
  {
    _id: "EVT002",
    type: "purchase",
    userId: "U1002",
    timestamp: ISODate(...),
    orderId: "O2001",
    amount: 12999
  }
]

Bucket pattern (for time-series data):

// Sensor data bucket
{
  _id: "SENSOR001_202305",
  sensorId: "SENSOR001",
  month: "202305",
  readings: [
    { timestamp: ISODate("2023-05-01T00:00:00Z"), value: 23.5 },
    { timestamp: ISODate("2023-05-01T00:05:00Z"), value: 23.7 },
    // ...more readings
  ],
  stats: {
    avgValue: 24.1,
    maxValue: 28.5,
    minValue: 22.3
  }
}

Computed pattern (pre-aggregated data):

// Daily sales summary
{
  _id: "SALES_20230515",
  date: ISODate("2023-05-15"),
  totalSales: 125000,
  categorySales: {
    electronics: 80000,
    clothing: 30000,
    groceries: 15000
  },
  paymentMethods: {
    credit: 70000,
    debit: 40000,
    cash: 15000
  }
}

做个网站！

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱：cc@cccx.cn

上一篇：MongoDB的适用场景与优势

下一篇：MongoDB的存储引擎（WiredTiger、In-Memory）