阿里云主机折上折
  • 微信号
Current Site:Index > The storage mechanism of Git

The storage mechanism of Git

Author:Chuan Chen 阅读数:33514人阅读 分类: 开发工具

Git is a distributed version control system whose core lies in efficiently managing file changes. Its storage mechanism achieves data persistence and rapid retrieval through the collaboration of an object database, index, and working directory.

Object Database

The core of Git is a key-value storage database where all data is stored as objects. Each object has a unique SHA-1 hash as its identifier. Objects are primarily divided into four types:

Blob Objects

Blobs (Binary Large Objects) store file content. For example, creating a file and committing it:

echo "Hello World" > test.txt
git add test.txt

At this point, Git creates a Blob object with the following content:

blob 12\0Hello World

Here, 12 is the content length, and \0 is a separator. The hash can be viewed using git hash-object:

git hash-object test.txt
# Example output: 557db03de997c86a4a028e1ebd3a1ceb225be238

Tree Objects

Tree objects act like directories, recording filenames and their corresponding Blob hashes. For example:

100644 blob 557db03...  test.txt
100755 blob 1a2b3c4...  script.sh

Tree objects are created using git write-tree:

git write-tree
# Example output: d8329fc1cc938780ffdd9f94e0d364e0ea74f579

Commit Objects

Commit objects point to a Tree object and include the author, commit message, and parent commit. The format is as follows:

tree d8329fc...
parent 1234567...
author John Doe <john@example.com> 1625097600 +0800
committer John Doe <john@example.com> 1625097600 +0800

Initial commit

Tag Objects

Tag objects are references to specific commits, containing the tag name, tag information, and signature:

object 789abc...
type commit
tag v1.0
tagger John Doe <john@example.com> 1625097600 +0800

Release version 1.0

Reference Mechanism

Git manages branches and tags through references (refs):

Branch References

Branches are stored in the .git/refs/heads/ directory. For example, the main branch corresponds to the file:

.git/refs/heads/main

The content is a commit hash:

a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0

HEAD Reference

HEAD is a special reference pointing to the current branch:

ref: refs/heads/main

Tag References

Lightweight tags point directly to commits:

.git/refs/tags/v1.0

Annotated tags point to Tag objects.

Index (Staging Area)

The index is a binary file .git/index with the following structure:

  1. 12-byte header
  2. Multiple index entries
  3. Extended data
  4. SHA-1 checksum

Each index entry contains:

struct cache_entry {
    uint32_t ctime_sec;
    uint32_t ctime_nsec;
    uint32_t mtime_sec;
    uint32_t mtime_nsec;
    uint32_t dev;
    uint32_t ino;
    uint32_t mode;
    uint32_t uid;
    uint32_t gid;
    uint32_t size;
    unsigned char sha1[20];
    unsigned short flags;
    char name[FLEX_ARRAY]; /* Variable length */
};

The index content can be viewed using git ls-files --stage:

git ls-files --stage
# Example output:
# 100644 557db03... 0       test.txt

Packfile Mechanism

When there are too many objects, Git packs them to save space:

Packfile Structure

The .git/objects/pack/ directory contains:

  • .pack: Object data
  • .idx: Index file

View pack contents with git verify-pack:

git verify-pack -v .git/objects/pack/pack-123456.idx

Delta Compression

Git uses delta compression to store object differences rather than complete content. For example:

base: blob A (100 bytes)
delta: blob B = A + "appended content"

Garbage Collection

Git automatically performs garbage collection via git gc:

  1. Packs loose objects
  2. Removes expired objects
  3. Optimizes packfiles

Manual execution:

git gc --auto

Low-Level Command Examples

Directly manipulate Git objects:

# Create a Blob
echo "test" | git hash-object -w --stdin

# Read an object
git cat-file -p 123456

# Update a reference
git update-ref refs/heads/new-branch a1b2c3d

Reflog (Reference Log)

Git records all reference changes:

git reflog show HEAD
# Example output:
# a1b2c3d HEAD@{0}: commit: Update README
# 1234567 HEAD@{1}: checkout: moving from dev to main

Logs are stored in the .git/logs/ directory with the format:

Old-hash New-hash Author Timestamp Action-info
a1b2c3d 1234567 John Doe <john@example.com> 1625097600 +0800 commit: Update file

Interaction Between Working Directory and Git

File states in the working directory are managed through three areas:

  1. Working directory: Actual files
  2. Index: Staging area
  3. HEAD: Last commit

Example state changes:

# Modify a file
echo "new content" > file.txt

# Check status
git status
# Output:
# Changes not staged for commit:
#   modified:   file.txt

# Add to staging area
git add file.txt

# Check status again
git status
# Output:
# Changes to be committed:
#   modified:   file.txt

Object Storage Optimization

Git employs multiple strategies to optimize storage:

Loose Objects and Packfiles

  • New objects are initially stored as loose objects
  • Automatically packed when a threshold (default: 6,700) is reached

Compression Strategies

  • zlib compression for object content
  • Delta compression for similar objects
  • Automatically selects the best base object during packing

Shared Objects

Share the object database when cloning with --shared:

git clone --shared /path/to/repo

Handling Hash Collisions

Although SHA-1 collisions are extremely unlikely, Git has safeguards:

  1. Compares full object content
  2. Rejects writing different content with the same hash
  3. Supports configuration for SHA-256

View object type and size:

git cat-file -t 123456
git cat-file -s 123456

Cross-Platform Compatibility

Git handles cross-platform issues:

  • Line ending conversion (core.autocrlf)
  • File permission storage (core.fileMode)
  • Filename case sensitivity (core.ignoreCase)

Configuration example:

git config --global core.autocrlf input
git config --global core.fileMode false

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.