阿里云主机折上折
  • 微信号
Current Site:Index > Git object database

Git object database

Author:Chuan Chen 阅读数:8665人阅读 分类: 开发工具

Introduction to Git Object Database

The core of Git is a content-addressable file system, whose underlying implementation relies on four basic object types: blob, tree, commit, and tag. These objects are stored in the .git/objects directory and together form Git's object database. Each object is identified by a unique SHA-1 hash value.

Object Storage Principles

Git uses the zlib compression algorithm to store all objects. When creating a new object, Git performs the following steps:

  1. Construct the object content (header + content)
  2. Calculate the SHA-1 checksum
  3. Compress the content using zlib
  4. Write the compressed data to the object database

Example for viewing object content:

# Find the hash of a commit
git log --oneline

# View object content
git cat-file -p <hash>

Four Basic Object Types

Blob Objects

Blobs (Binary Large Objects) store file content. Each version of a file corresponds to a blob, and files with identical content are stored only once.

Example of creating a blob:

// Simulate Git's blob creation process
function createBlob(content) {
  const header = `blob ${content.length}\0`;
  const store = header + content;
  const sha1 = require('crypto').createHash('sha1').update(store).digest('hex');
  return { sha1, content: store };
}

const blob = createBlob('Hello, Git!');
console.log(blob.sha1);  // Outputs something like "8ab686eafeb1f44702738c8b0f24f2567c36da6d"

Tree Objects

Tree objects act like directories, recording references to a set of blobs and other trees, including file mode, type, SHA-1, and filename.

Typical tree object content:

100644 blob a906cb2a4a904a152e80877d4088654daad0c859    README
040000 tree 0f1d6e3a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a    lib

Commit Objects

Commit objects point to a tree object and contain author, committer, date, commit message, and references to parent commits.

Example commit content:

tree 92b8b6ffb019642e2f9f4c9a6a6a6a6a6a6a6a6a6
parent 2a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a
author John Doe <john@example.com> 1529504835 +0800
committer Jane Smith <jane@example.com> 1529504835 +0800

Initial commit

Tag Objects

Tag objects are pointers to specific commits, typically used to mark release versions. They contain the tag name, tagger information, date, and notes.

Object Reference Mechanism

Git uses references (refs) to track objects. References are pointers to commit objects, stored in the .git/refs directory.

Common reference types:

  • Branch references: .git/refs/heads/
  • Remote tracking branches: .git/refs/remotes/
  • Tag references: .git/refs/tags/

Example for viewing references:

cat .git/refs/heads/master

Object Packing Optimization

As repository history grows, Git automatically packs multiple loose objects into .pack files and generates corresponding .idx index files. This packed storage format significantly saves space.

Manually trigger packing:

git gc

Low-Level Command Operations

Git provides a series of low-level commands for directly manipulating the object database:

# Calculate an object's SHA-1 without storing it
git hash-object -w <file>

# Create a tree object
git update-index --add <file>
git write-tree

# Create a commit object
echo "message" | git commit-tree <tree-hash>

# Create a commit object with a parent
echo "message" | git commit-tree <tree-hash> -p <parent-hash>

Object Database Recovery

When issues arise, the object database can be recovered through the following methods:

  1. Find dangling objects:
git fsck --lost-found
  1. Recover from reflog:
git reflog
git checkout <hash>

Advanced Application Examples

Custom Object Storage

Directly create Git objects using low-level APIs:

const fs = require('fs');
const crypto = require('crypto');
const zlib = require('zlib');

function storeGitObject(content, type = 'blob') {
  const header = `${type} ${content.length}\0`;
  const store = Buffer.concat([Buffer.from(header), Buffer.from(content)]);
  const sha1 = crypto.createHash('sha1').update(store).digest('hex');
  
  const dir = `.git/objects/${sha1.substring(0,2)}`;
  const file = `${dir}/${sha1.substring(2)}`;
  
  if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
  
  const compressed = zlib.deflateSync(store);
  fs.writeFileSync(file, compressed);
  
  return sha1;
}

const blobHash = storeGitObject('Custom content');
console.log(`Stored blob with hash: ${blobHash}`);

Parsing Pack Files

Example of parsing Git pack file format:

function parsePackIndex(file) {
  const buffer = fs.readFileSync(file);
  // Magic number: 4 bytes
  const magic = buffer.toString('hex', 0, 4);
  // Version: 4 bytes
  const version = buffer.readUInt32BE(4);
  
  // Subsequent parsing of fanout table, SHA-1 list, CRC32, offsets, etc.
  // ...
  
  return { magic, version };
}

Performance Optimization Considerations

  1. Object Caching: Git caches frequently used objects to improve performance.
  2. Delta Compression: Similar objects in pack files undergo delta compression.
  3. Bitmap Indexes: Accelerate clone and fetch operations.
  4. Multi-Pack Indexes: Optimize access to repositories with multiple pack files.

View object statistics:

git count-objects -v

Practical Case Analysis

Consider a repository with 10,000 files, 8,000 of which are duplicate test data:

  1. Although there are many files, Git only stores files with different content.
  2. Files with identical content share the same blob.
  3. After packing, similar files undergo delta compression.
  4. The final storage space is much smaller than the total actual file size.

Verify storage efficiency:

# View object count
git count-objects -v

# View repository size
du -sh .git

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.