Give Your AI a Memory: What I Actually Built

The architecture of stonerOS—a persistent context system running on Claude Code. Plain files. No database. Two months of iteration.

Most AI memory features are flat lists. Things the model noticed, stored without structure or priority. You spend three hours mapping project architecture. The AI remembers your name. It forgets the tradeoff you already rejected.

I built a system that fixes this for me—stonerOS. It runs on Claude Code.

Listen

The Foundation

The core idea: my AI reads markdown files at the start of every session. Those files contain everything about me—history, current work, preferences, how I think. When something changes, I update a file. Next session picks it up.

That’s the whole idea. The rest is file organization and routing logic.

The stack:

AI: Claude Code (CLI)—reads CLAUDE.md automatically as a system prompt
Files: Markdown, in a git repo synced to iCloud
Automation: Shell hooks executed by Claude Code’s hooks system
Cost: Claude Max subscription + $0 infrastructure

No database. No vector store. No embeddings. No app wrapper.

The Memory Architecture

I didn’t start with a clean architecture. I started with one file and iterated until it broke. What I ended up with has three layers—not because I designed them, but because the problems I hit demanded them.

Layer 1: Baseline (Read-Only Reference)

memories/
├── baseline_insights.md    # Career history, core identity, personality baseline
├── professional/           # Resume variants, portfolio docs
└── expertise/
    └── technical_skills.md

Rule: Never edit during a session. This is the starting truth. Load it after corrections—never before.

Layer 2: Learnings (Append-Only)

learnings/
├── corrections.md       # Errors the AI made—load FIRST, before anything else
├── patterns.md          # Observed behavioral and work patterns
├── communication.md     # Voice, tone, format preferences
└── expertise.md         # Domain knowledge accumulated over time

Rule: New entries get appended with a date and context. Old entries stay. The corrections file is the highest-ROI single file in the system.

Layer 3: Current State (Mutable)

preferences/
└── current.md           # Active preferences, decision patterns

context/
└── session_YYYYMMDD.md  # Most recent session notes

Rule: Overwritten each session, not appended. These represent right now.

The Session Start Protocol

The naive version: read all your files at the start of every session. I did this for a while. It’s wasteful—most sessions don’t need the full context.

The actual system uses a context-router—a lightweight agent that classifies the first message and returns a tier:

Transactional: one-off question, quick task. Skip memory load entirely.
Partial: load preferences/current.md + learnings/corrections.md only.
Deep-work: full load in sequence—then invoke the skill-curator to suggest active skill modules.

The sequence for a deep-work session:

Anchor the clock (run date—ground truth for any scheduling)
Invoke context-router—get the tier
Load corrections.md before baseline_insights (corrections override everything)
Load baseline_insights
Check context/ for the most recent session file
Invoke skill-curator—reads context and suggests which skill modules to activate

Context has a cost. A transactional question doesn’t need 40 files loaded. The routing tier keeps overhead proportional to the actual work.

The Corrections Pattern

This is the single most valuable thing I built.

AI models make confident errors about things you’ve told them. They’ll synthesize two facts into a claim that sounds right but isn’t. They’ll attribute something to you that you never said.

When this happens:

## 2026-03-10: Project shipped Q3, not Q4
**Context**: AI wrote "launched in Q4 2025" in a portfolio write-up
**Correction**: Project shipped end of September—Q3, not Q4
**Rule**: Always say "Q3 2025 launch" when referencing this project.

Load this file first—before the baseline, before anything else. In practice, after a few weeks, this starts to reduce the AI’s most common mistakes about you specifically—not eliminate them, but it builds a working record of where it’s failed before. This is the closest I’ve gotten to actually teaching an LLM.

The reason for two files instead of just editing the baseline: the baseline is a stable reference you’re supposed to trust. Editing it on the first instance of being wrong—or mid-session—risks corrupting your source of truth. The corrections file lets you capture fast, without restructuring anything. It also creates an audit trail: you can see exactly what the AI got wrong and when. Once a correction has proven itself over multiple sessions, it earns its way into the baseline.

That merge step matters. Corrections are a staging layer, not a permanent override stack. Periodically—once the file is dense enough—do a re-baselining pass: merge confirmed corrections back into baseline_insights.md, then retire the correction entry. If you skip this, the two files drift further apart over time and the load-order dependency becomes load-bearing in ways that are hard to debug.

Write Safety

When I started adding background agents that could write to memory files, I hit a race condition: two agents writing to the same file simultaneously, each with incomplete context. The fix was a hard architectural rule: one designated writer, everything else read-only.

In stonerOS: session-writer is the only agent allowed to write to learnings/, preferences/, or context/. Every other agent—even heavy ones doing hours of research—writes to temp/ and signals the orchestrator to invoke session-writer. The session-writer runs a dedup-check before every append.

This prevents silent memory corruption that would otherwise be invisible until the corrupted context affected downstream output.

The Hooks System

Claude Code lets you attach shell scripts to moments in a session—before a message is processed, before a tool runs, when a background agent starts or stops. I have 21 of these attached. Those scripts run automatically.

The practical effect: things that would otherwise require manual setup happen on their own. A hook that runs before every message can check the current time and inject it into the session—so the AI always has an accurate clock without me asking. Another hook checks my job application pipeline and surfaces anything due today. I never set context manually; the hooks set it for me.

The other category is safety. Before any tool executes, a second set of hooks inspects the operation. In autonomous mode, these block destructive commands—deleting files, force-pushing to git, writing outside the designated directory—and send a desktop notification if something gets blocked. The session can’t do damage quietly.

A third pair tracks background agents: when one starts and when it stops, both get logged automatically. If an agent fails silently, the log shows the gap.

Without hooks, the session start protocol is just documentation—things you’re supposed to remember to do. With hooks, they happen whether you remember or not.

The Agent System

I split responsibilities across 23 subagents—three tiers by capability cost.

Haiku (fast, cheap): context-router, session-writer, dedup-validator, skill-curator, voice-qa. These run on every session—cost matters.

Sonnet (capable): research-analyst, code-architect, archivist, talent, docs-sync, security-auditor, collectibles-expert. Heavy lifting at reasonable cost.

Opus (reserved): /debate, /challenge, architecture decisions, adversarial analysis. Explicitly gated—runs only when the task actually needs Opus-tier reasoning.

My rule: once the main thread hits 3+ sequential tool calls, a subagent should own it instead. The main thread orchestrates—subagents execute.

The Skill and Command System

On top of the memory architecture sits a command layer: 39 slash commands, 11 active skill modules.

Skills are loaded markdown files that give the AI specialized context and procedures for a domain. The job_search skill knows the pipeline schema, cover letter patterns, and application tracking format. The portfolio skill knows the content architecture and publishing workflow. Skills can be activated mid-session and deactivated when not needed.

Slash commands invoke skill pipelines: /job-sprint [url] runs the full application pipeline. /morning-brief loads calendar, pipeline, and overnight captures into a brief. /weekly-review synthesizes the past week’s sessions into patterns.

Automation

Three scripts handle the scheduled and triggered work:

context-injector.sh: Runs before every session—injects the current time, pipeline status, and other live context so the session starts grounded without manual setup
daily-brief.py: Gates the first session each day—automatically runs the morning brief before anything else
autonomous-runner.sh: Manual trigger for unattended execution, with 6 safety layers and a sandboxed write path

Longer-running scheduled tasks run as Claude Code remote triggers—this keeps scheduled work inside the same session architecture rather than managing a separate cron system alongside it.

Git

git init
git add -A
git commit -m "Initial memory system"

Every commit pushes automatically—treat anything committed as immediately public. A pre-commit check scans for accidentally included passwords or API keys and blocks the commit if it finds any. The bigger benefit is history: every preference file, every learnings entry, every correction has a timestamp and a diff. You can see exactly when something changed and why, which matters more than you’d expect once the system has been running for a few months.

stonerOS is a personal AI memory system built on Claude Code, running since February 2026. Full architecture at josh-stoner.github.io.

What My AI Taught Me About Myself Bring Your Own AI