About This Project
Every Claude session starts from zero. No memory of prior conversations, no knowledge of preferences, no continuity across sessions. For a daily AI user doing serious work—job search strategy, resume iteration, research synthesis—that’s a meaningful friction point.
I built stonerOS: a production AI memory system that gives Claude persistent, evolving knowledge about me across every session. It’s a git repository—3-layer memory architecture, 22 specialized agents across three model tiers, 17 automation hooks, pre-commit secret scanning, auto-push pipeline, and SessionStart staleness surfacing. I use it every day. No one asked me to build it.
What makes it a portfolio piece is not that it was built—it’s how it evolved. The system at version 2.25 looks materially different from v1.0. Real use generated real friction. Real friction generated real fixes. The system learns about me as I use it. I also got better at building systems by running one.
This demonstrates: AI systems architecture, multi-agent orchestration, automation pipeline design, iterative engineering judgment, and the kind of self-directed builder instinct that doesn’t wait for a formal request to solve a real problem.
Context
| Role | Self-initiated—designer, architect, engineer, user |
| Start | February 2026 (built post-layoff, October 2025) |
| Status | Active production use—daily |
| Stack | Claude API, Claude Code hooks, bash, git, Python |
| Repo | Private GitHub |
| Version | 2.25 as of April 2026 |
The Problem
Large language models have no persistent memory by default. Every new conversation is a blank slate. For casual use, that’s fine. For someone running active job search operations, refining career strategy, managing a personal knowledge base, and iterating on complex documents over weeks—it’s a significant gap.
Workarounds exist but each breaks down:
- Pasting context into every session: scales poorly, hits token limits, costs time
- Uploading files manually: brittle, requires remembering what’s relevant
- Relying on the model to “remember”: doesn’t persist across API calls
The real problem wasn’t technical—it was that there was no designed system for how AI memory should work for a single user doing complex, evolving work.
The Architecture
A three-layer memory hierarchy with agent delegation and automated CI/CD.
CLAUDE.md (auto-read every session — master instruction set)
↓
Memory Layer (3 tiers, append-only):
memories/ immutable baseline (professional history, personality exports)
learnings/ append-only (patterns, expertise, communication style, corrections)
preferences/ current state (tools, active projects, workflow context)
↓
Session Router:
context-router (Haiku) → classifies session in <3 sec
transactional → skip memory load
partial → preferences + corrections only
deep-work → full load + skill suggestions
↓
Agent Layer (22 agents, 3 model tiers):
Haiku: memory-search, session-writer, qa-validator, dedup-validator,
context-router, skill-curator, session-analyst, voice-qa
Sonnet: archivist, talent, code-architect, research-analyst, docs-sync,
repo-janitor, finance-analyst, security-auditor, privacy-auditor,
claude-code-expert, apple-designer, apple-developer, design-qa
Opus: main orchestrator (cost gate — not a quality upgrade)
↓
Automation Layer (17 hooks):
PostToolUse hook → timestamps every file change to session log
SessionEnd hook → auto-commits + pushes to GitHub
PreCompact hook → writes checkpoint before context compression
SubagentStart/Stop → full agent lifecycle audit trail (zero instrumentation)
UserPromptSubmit hook → injects clock + pipeline status on every turn
PreToolUse hooks → destructive-op guardrails + stuck-loop detection
pre-commit hook → 9 secret patterns scanned; read-only enforcement
Key design decisions at the start:
- Append-only learnings: old memories preserved as written; git history = version control
- Model-tier delegation: Haiku for reads and simple writes, Sonnet for reasoning—reduces per-session token cost significantly
- Safety-first: pre-commit blocks secrets before they can reach GitHub; post-commit auto-pushes, so assume everything goes public immediately
- Intelligent session routing: context-router (Haiku) classifies each session in <3 seconds—transactional requests skip memory load entirely
The Iteration Record
This is where the portfolio piece lives. The initial build was functional. The current system is correct. What separates them is a documented log of real friction.
Correction Layer (v1 → v1.5)
The initial memory baseline contained factual errors embedded in an immutable file designated as read-only. Rather than edit the source (which would have broken the design intent of the immutable tier), I built a corrections overlay: a file that any agent must load before reading the baseline. This pattern—layered overrides rather than in-place edits—became a design principle.
Write Race Bug (v1.5 → v2.0)
Early multi-agent sessions produced corrupted output files. Investigation traced it to parallel agents writing to the same file simultaneously. Fix: max 1 writer active at any time. A dedicated session-writer agent became the sole agent authorized to write to memory files. All other agents write staging output first, then signal the orchestrator for integration.
Silent Failure Protocol (v2.0 → v2.1)
Background agents occasionally completed with empty output files. Fix: launch logging + completion markers. Before every background task, a LAUNCH entry is logged. Agents write COMPLETE or FAILED markers on finish. After every background task: verify the output file is non-empty before integrating.
QA Gate Protocol (v2.0 → v2.1)
For significant agent outputs, a qa-validator reviews output before returning it. Max 2 retries on NEEDS_REVISION verdict, then escalate. The constraint on retries was deliberate: preventing infinite loops by design, not by hope.
Token Optimization (v2.1 → v2.2)
Initial sessions loaded too much context. Added ignore rules, condensed index files, and session-start protocol to skip memory load for transactional requests. Result: transactional sessions now skip memory load entirely.
Skill Module System (v2.2 → v2.3)
Loading all skills into every session was wasteful. Built a skill module system: skills/_active/ holds currently relevant skills; inactive skills live separately. Activation is explicit—load only what’s needed.
Backlog-as-System (v2.3 → v2.4)
Added a living bug tracker and feature backlog for the system itself. A session-analyst agent reads the backlog weekly and surfaces recurring patterns. The system now has a product management layer for its own development.
Context-Router & Session Classification (v2.4 → v2.5)
Not every session needs full memory. A Haiku classifier reads the first message and returns a memory-load tier in under 3 seconds. Cost optimization through intelligence, not reduction.
Learnings Integrity (v2.5 → v2.6)
As learnings grew, semantic near-duplicates accumulated. Added a dedup-validator that checks every proposed entry against existing entries. Returns UNIQUE, SUPERSEDE, or DUPLICATE at a 60% conceptual overlap threshold.
Decision Engineering & Adversarial Patterns (v2.6 → v2.7)
As the system matured, single-agent outputs became the bottleneck—not in quality, but in blind spots. Built an adversarial debate pattern: Advocate and Critic agents spawn in parallel with opposing briefs, then a Judge synthesizes the strongest arguments into a verdict. Also added an ensemble pattern for career-critical outputs—two independent agents run the same task in parallel, and their divergences surface blind spots neither catches alone.
Also: 8 product management frameworks as interactive commands, and a decision log that prevents re-evaluating the same tools and patterns across sessions.
Autonomous Operations & Platform Expansion (v2.7 → v2.8)
The system was powerful but required me to be present. Built autonomous execution—headless sessions on a schedule with 6-layer safety: kill switch, command allowlist, 10-minute timeout, 30-turn cap per run, git-based write sandbox, and execution log. Two scheduled jobs ran weekly via launchd.
Also: security-auditor agent for macOS defensive security; a repository evaluation pipeline; Safari Bridge for browser interaction via AppleScript; a collectibles data pipeline; and a persistent design system for app building.
Autonomous Deprecation & SessionStart Staleness (v2.8 → v2.25, DEC-017)
The launchd cron approach was deprecated after a 4-week minimum viable autonomy (MVA) trial. Root cause: macOS TCC blocks launchd-spawned bash from reading the iCloud-resident repo (“Operation not permitted”), and StartCalendarInterval silently skips when the Mac is asleep at trigger time. Zero cron-fired runs ever completed—the execution log entries had all been manual terminal invocations. The autonomous channel was off for 16+ days before I noticed, with zero felt pain.
Replaced with SessionStart staleness surfacing: every prompt injects a one-line status (pipeline count, inbox unprocessed, days-since-weekly-review) via a UserPromptSubmit hook. Overdue items appear at the top of every session; I decide whether to act. No headless execution, no TCC battles, no silent failures. The lesson—that a shipped feature can be invisibly broken and the underlying need can evaporate—is itself part of what the system is for.
Also in this stretch: 22 agents total (added voice-qa, privacy-auditor, claude-code-expert, apple-designer, apple-developer, design-qa); the full AI Brain writing series (Chapters 0–4) published with per-chapter wayfinding and TTS narration; /build-html and /build-audio sister skills for publishing pipeline automation; and the migration from launchd-driven automation to session-driven awareness.
What Makes It a Learning System
| Learning Principle | stonerOS Implementation |
|---|---|
| Prior knowledge matters | Baseline memory layer loads before each session—Claude enters knowing who I am |
| Active recall > passive storage | memory-search agent retrieves specific memories without loading all files |
| Spaced repetition via use | Learnings accumulate across sessions; frequently referenced patterns surface naturally |
| Error correction | Corrections overlay overrides factual errors in baseline |
| Transfer | Agent delegation routes tasks to the right specialist |
| Adaptive load | context-router adjusts how much memory loads per session |
| Metacognition | session-analyst reviews the system’s own performance weekly |
Evidence of Production Use
- Git log: Session-end commits timestamped to every working session since February 2026
- Agent invocations: 22 agents across 3 model tiers actively routing tasks
- 17 automation hooks: clock-anchor, context-injector, drain-pipeline, end-session, idle-detection, log-changes, post-compact, pre-compact, pre-tooluse-guardrail, rhythm-check, stale-html-check, statusline, subagent-start, subagent-stop, validate-agent, giggles-state, giggles-session-end—zero manual intervention
- SessionStart staleness surfacing: Overdue pipeline/inbox/review items injected into every prompt (replaced 7 launchd cron jobs retired 2026-04-06 after a 4-week MVA proved TCC blocked them; see DEC-017)
- 34 slash commands: From
/weekly-reviewto/build-audio—pre-loaded workflows that expand into full operations - 11 active skill modules: Covering job search, finance, security, AI research, portfolio management, and more
- Correction layer active: Known baseline errors documented and overridden
- BUG-002 resolved: Write race fix has held across hundreds of sessions
- Token reduction confirmed: transactional sessions skip memory load entirely; light-use days average under $2 vs. deep-work days at $10–$34 (ccusage data, Feb–Mar 2026)
- Version history: v1 → v2.25 across 15+ documented iterations
What It Demonstrates
| Competency | Evidence |
|---|---|
| Systems architecture | 3-layer memory hierarchy; correction overlay pattern; context-router; adversarial debate and ensemble patterns |
| AI/ML engineering | 22-agent orchestration; model-tier delegation; dedup-validator for learnings integrity |
| Automation & CI/CD | 17 hooks; pre-commit secret scanning; auto-push pipeline; SessionStart staleness surfacing (replaced deprecated launchd cron—DEC-017) |
| Iterative engineering | v1 → v2.25 across 15+ iterations—each driven by real production friction including deprecated autonomous execution |
| Product thinking | 34 slash commands, 11 active skill modules, session classification, QA gates, decision log |
| Learning systems design | Architecture mirrors instructional design principles |
| Self-direction | No spec, no team, no deadline—continuously iterated across 15+ versions, in daily production |