Building on Git's Primitives

How studying git's four object types led me to build an issue tracker with no database, no server, and no abstraction layer.

Emerson Soares · May 16, 2026

I wanted to understand git. Not as a user. I’d been using git for over a decade, and like most developers, I had a working mental model: branches are pointers, commits are snapshots, merges combine histories. Good enough for daily work. But I kept running into moments where the abstraction leaked. Rebase doing something I didn’t expect. A merge conflict that seemed impossible given the changes. A detached HEAD that shouldn’t have been detached. Every time, the answer required understanding what was actually happening underneath.

So I started reading. Not tutorials. The git internals chapter of Pro Git, then the source code, then Linus’s original design emails. I wanted to know what git is, not what it does.

Along the way I found a message Linus posted to the kernel mailing list in April 2007, during a flame war about Bugzilla after the 2.6.21 release shipped with 14 known regressions:

“There must be some better form of bug tracking than bugzilla. Some really distributed way of letting people work together, without having to congregate on some central web-site kind of thing. A ‘git for bugs’, where you can track bugs locally and without a web interface.”

Nineteen years later, nobody had built it. Several tried. Bugs Everywhere stored issues in the working tree and caused merge conflicts. ticgit’s creator, Scott Chacon, went on to build GitHub instead. git-bug used CRDTs, which turned out to be overkill. None produced a format spec that other tools could implement.

That study, and that unsolved problem, led me to build git-native-issue. An issue tracker that uses git’s own object model as its database. No server, no external storage, no abstraction layer. Issues are git objects, stored in git, synced by git. This is the story of how that happened and what it taught me about architecture.

What I found inside git was smaller than I expected.

Four objects

Git’s entire data model is four object types. That’s it. Everything git does, every feature, every workflow, every integration, composes from these four things.

A blob is a chunk of content. No filename, no path, no metadata. Just bytes, identified by their SHA hash. Two files with identical content are the same blob. Git doesn’t store files. It stores content.

A tree is a directory listing. It maps names to blobs (files) or other trees (subdirectories). A tree is a snapshot of a directory at a point in time. It doesn’t know what changed. It just knows what exists right now.

A commit points to a tree (the snapshot), to zero or more parent commits (the history), and carries metadata: author, committer, timestamp, message. A commit is an event that says “at this moment, the world looked like this, and here’s why.” The chain of parent pointers is the project’s history. Branching is just having two commits that point to the same parent. Merging is a commit with two parents.

A tag is a named pointer to any object, usually a commit, with optional metadata and a signature. Tags mark moments. Release v1.0.0 is a tag pointing to the commit where that release happened.

Four types. Content, structure, events, names. From these, git builds version control, branching, merging, rebasing, bisecting, cherry-picking, blame, stash, worktrees. The entire surface area of git, every command you’ve ever used, is an operation on some combination of blobs, trees, commits, and tags.

The thing that struck me wasn’t the elegance, though it is elegant. It was the sufficiency. Linus didn’t build features. He built primitives, and the features emerged from composition. When someone needed rebasing, they didn’t need a new data type. Rebase is just creating new commits that point to new parents. When someone needed stash, they didn’t need a new storage mechanism. Stash is just commits on a special ref.

The problem nobody talks about

Your code travels with git clone. Your issues don’t.

Migrate from GitHub to GitLab? Your code comes with you. Your issues stay behind, locked in a platform’s database. Want to work offline? Code works fine. Issues need a browser and an internet connection. Five years of issue history on a project? Prepare to write export scripts, lose metadata, and rebuild links manually.

Issues are less portable than code because we store them in proprietary databases hosted on platforms. They’re records in APIs, not versioned data.

The insight

With four objects fresh in my mind, I started thinking about what an issue actually is, structurally. Not because I was unhappy with GitHub Issues. I was asking whether git’s primitives could express it.

An issue has a title, a description, some metadata (state, labels, assignee), a history of comments, and a history of state changes. It belongs to a project. It needs to sync across machines. It needs to handle concurrent edits from different people. It needs to preserve history.

Git already solves all of that. A commit carries a message (title + description), metadata (trailers), authorship, and timestamps. A chain of commits gives you history. Refs give you identity and lookup. Push and fetch give you sync. Merge gives you concurrent edit resolution. The entire infrastructure already exists. Building another database on top of git to track issues is building a worse version of something git already provides.

That was the moment. Not “I should build a tool.” The moment was: “issues are git objects.” The mapping isn’t a metaphor. It’s structural:

Issue conceptGit primitiveWhy it works
Issue identityrefs/issues/<uuid>Unique, collision-free across clones
Events (create, comment, close)Commits in a chainAppend-only, content-addressed, cryptographically verified
Metadata (state, labels, priority)Git trailersParseable with standard git tooling (interpret-trailers)
HistoryCommit ancestrygit log refs/issues/<id> gives the full timeline
Syncgit fetch/pushZero custom protocol needed
Conflict resolutionThree-way mergeDivergent updates resolve deterministically
Offline accessLocal refsFull read/write without network

The mapping

Here is how git-native-issue works. Each issue is a chain of commits living under refs/issues/<uuid>. The ref points to the latest commit, the issue’s HEAD. Walking the chain backward gives you the full history.

The root commit, the one with no parents, is the issue creation event. Its subject line is the title. Its body is the description. Its trailers carry metadata:

Fix login crash with special characters

The login page crashes when the user enters special characters
in the password field.

State: open
Labels: bug, auth
Priority: high
Format-Version: 1

That’s a real commit message. Not JSON in a blob. Not a row in SQLite. A commit message, readable with git log, queryable with git for-each-ref, parseable with git interpret-trailers. Every piece of issue data lives where git already knows how to find it.

Comments are child commits. Each one has the previous issue HEAD as its parent, extending the chain. State changes are commits with a State: trailer. A comment that also closes the issue is a single commit carrying both the comment text and a State: closed trailer. The current state of an issue is computed by walking the chain and taking the most recent State: trailer. No mutable state. No database updates. The state is a function of the history.

Labels use trailers too. When two clones diverge and both modify labels, the merge uses three-way set merge against the common ancestor. Additions from both sides are kept. Removals from both sides are applied. If one side added a label and the other removed it, the addition wins, biased toward keeping data. Scalar fields like assignee and priority use last-writer-wins by timestamp. These aren’t novel algorithms. They’re the simplest deterministic rules I could find that produce correct results without coordination.

Identity is a UUID per issue. No sequential counter, no next-id file, no coordination. UUIDs can be generated independently on any clone with zero collision risk. The first seven characters serve as a human-friendly short ID, same convention git uses for commit SHAs.

Speaking git’s language

The tool is written in POSIX shell. Creating an issue means creating a commit object and a ref. Here’s the core of git issue create, stripped to its essentials:

empty_tree="$(git hash-object -t tree /dev/null)"
commit="$(git commit-tree -- "$empty_tree" < "$tmpfile")"
git update-ref -- "refs/issues/$uuid" "$commit"

Three plumbing commands. That’s the part that surprised me most. I expected issue creation to be the hard part, the part where I’d need to build real infrastructure. Instead it’s three lines. git hash-object computes the empty tree’s SHA (issue commits carry no file content, so the tree is always empty). git commit-tree creates a commit object pointing to that tree, with the issue message from a temp file. git update-ref creates the ref. An issue now exists in the repository.

Reading the state of an issue:

git log --format='%(trailers:key=State,valueonly)' refs/issues/<uuid> \
  | grep -m1 .

One command. Walk the commit log, extract the State trailer values, take the first non-empty one. No parsing library, no query engine. Git’s own format strings do the work.

Listing all issues:

git for-each-ref \
  --format='%(refname:short) %(contents:subject) %(trailers:key=State,valueonly)' \
  refs/issues/

Again, one command. Git iterates the refs, formats the output. The tool doesn’t maintain an index. Git’s ref storage is the index.

Syncing issues between clones:

git push origin 'refs/issues/*'
git fetch origin 'refs/issues/*:refs/issues/*'

Standard transport. No custom protocol, no API, no webhook. Issues travel the same way code does. If you can push code to a remote, you can push issues to that remote. The tool doesn’t implement sync. Git implements sync. The tool just uses the right refspec.

This is what I mean by speaking git’s language. The tool uses plumbing commands, git’s low-level object manipulation API, not porcelain (the user-facing commands that format output for humans). No ORM, no abstraction layer, no wrapper library. The code reads like a conversation with git’s object store, because that’s what it is.

What this cost

Building directly on git’s primitives means accepting what git doesn’t give you.

No web UI. Git doesn’t render HTML. If you want a browser-based interface, you need a separate application that reads the refs and renders them. The tool includes platform bridges (GitHub, GitLab, Gitea) that export issues to those platforms for web access. But the canonical data lives in git, not in any platform’s database. The bridges are convenience, not authority.

No real-time collaboration. Issues propagate when you push or fetch, the same as code. There’s no live updating, no presence indicators, no “someone is typing.” If two people edit the same issue before syncing, the merge rules resolve it deterministically. But there’s no way to prevent the divergence in the first place.

No access control beyond what git itself provides. If you can write to the repository, you can modify any issue. Fine-grained permissions (this person can close issues but not delete them) would require a layer on top that git doesn’t offer.

These aren’t limitations I plan to fix. They’re deliberate non-goals. Each one would require building infrastructure that git’s primitives don’t support cheaply. Adding a web UI means running a server. Adding real-time means building a message bus. Adding access control means building an authorization layer. Each of those is a real system with its own complexity. The whole point was to not build those systems. The cost of staying on the substrate is giving up what the substrate doesn’t provide.

The real deliverable isn’t the tool

Every previous attempt at distributed issue tracking failed to produce a format specification. The “format” was whatever the code happened to produce. You couldn’t build a compatible tool without reverse-engineering the implementation.

The real deliverable of git-native-issue is ISSUE-FORMAT.md: a standalone spec for storing issues in git, independent of this tool. If the git community adopts the format, platforms like GitHub, GitLab, and Forgejo could support refs/issues/* natively, making issue portability as natural as code portability. That’s the long game. The tool is the reference implementation. The spec is the contribution.

What this taught me

The architecture lesson was simpler than I expected: when your substrate has the right primitives, you don’t need to build your own.

Git already had content-addressable storage, immutable history, distributed sync, ref-based identity, merge algorithms, and trailer-based metadata. I didn’t invent any of these. I mapped issue-tracking concepts onto them. The result is a tool where creating an issue is three shell commands, reading state is one format string, syncing is a refspec, and merging is git’s own merge machinery. The total codebase is POSIX shell. No runtime dependencies beyond git itself.

The system looks like the thing it models. When you read the source of git-native-issue, you see git commit-tree, git update-ref, git for-each-ref, git interpret-trailers. You see git’s primitives being composed into issue-tracking operations. The code doesn’t hide what it’s doing behind an abstraction. It can’t. The abstraction IS the substrate. Reading the code teaches you how git works, because the code is git, used directly.

That second property, the system looking like its domain, turned out to matter more than I initially realized. It makes the tool auditable. Anyone who understands git’s object model can predict exactly what git issue create does, because it does what git does. There are no surprises hiding behind a framework. There’s no behavior that isn’t a direct consequence of how blobs, trees, commits, and refs work.

I keep thinking about why this worked so well. Four objects. Everything composes from them. No fifth type was ever needed. The architecture wasn’t a layer on top of the data model. The data model WAS the architecture. And the system ended up looking like the thing it models.

I don’t know yet whether that’s a principle or just a property of git. But I notice the same shape showing up in other places. A small number of composable pieces. Everything built from them. The system legible because the domain is right there in the code, not hidden behind layers that exist for organizational reasons.

Maybe the best architecture is the one you don’t build. Maybe it’s the one that was already there, in the substrate, waiting to be used directly. I’m still thinking about it.

git-native-issue is open source: github.com/remenoscodes/git-native-issue


Authoring note. Drafting assistance by Claude. All arguments, examples, judgments, and claims are the author’s. AI was used for rendering (organizing, rephrasing, tightening), not for originating the thesis or authoring claims.