stonerOS — Personal AI Memory System

stonerOS — Personal AI Memory System Production AI System | Self-Initiated | February 2026–Present

AI systems architecture Multi-agent orchestration Automation & CI/CD Iterative design Self-directed build

Listen

About This Project

Every Claude session starts from zero. No memory of prior conversations, no knowledge of preferences, no continuity across sessions. For a daily AI user doing serious work—job search strategy, resume iteration, research synthesis—that’s a meaningful friction point.

I built stonerOS: a production AI memory system that gives Claude persistent, evolving knowledge about me across every session. It’s a git repository—3-layer memory architecture, 23 specialized agents across three model tiers, 21 automation hooks, pre-commit secret scanning, auto-push pipeline, and SessionStart staleness surfacing. I use it every day. No one asked me to build it.

What makes it a portfolio piece is not that it was built—it’s how it evolved. The system at version 2.25 looks materially different from v1.0. Real use generated real friction. Real friction generated real fixes. The system learns about me as I use it. I also got better at building systems by running one.

This demonstrates: AI systems architecture, multi-agent orchestration, automation pipeline design, iterative engineering judgment, and the kind of self-directed builder instinct that doesn’t wait for a formal request to solve a real problem.

Context

Role	Self-initiated—designer, architect, engineer, user
Start	February 2026 (built post-layoff, October 2025)
Status	Active production use—daily
Stack	Claude API, Claude Code hooks, bash, git, Python
Repo	Private GitHub
Version	2.25 as of April 2026

The Problem

Large language models have no persistent memory by default. Every new conversation is a blank slate. For casual use, that’s fine. For someone running active job search operations, refining career strategy, managing a personal knowledge base, and iterating on complex documents over weeks—it’s a significant gap.

Workarounds exist but each breaks down:

Pasting context into every session: scales poorly, hits token limits, costs time
Uploading files manually: brittle, requires remembering what’s relevant
Relying on the model to “remember”: doesn’t persist across API calls

The real problem wasn’t technical—it was that there was no designed system for how AI memory should work for a single user doing complex, evolving work.

The Architecture

A three-layer memory hierarchy with agent delegation and automated CI/CD.

CLAUDE.md (auto-read every session — master instruction set)
  ↓
Memory Layer (3 tiers, append-only):
  memories/       immutable baseline (professional history, personality exports)
  learnings/      append-only (patterns, expertise, communication style, corrections)
  preferences/    current state (tools, active projects, workflow context)
  ↓
Session Router:
  context-router (Haiku) → classifies session in <3 sec
    transactional → skip memory load
    partial       → preferences + corrections only
    deep-work     → full load + skill suggestions
  ↓
Agent Layer (23 agents, 3 model tiers):
  Haiku:   memory-search, session-writer, qa-validator, dedup-validator,
           context-router, skill-curator, session-analyst, voice-qa
  Sonnet:  archivist, talent, code-architect, research-analyst, docs-sync,
           repo-janitor, finance-analyst, security-auditor, privacy-auditor,
           claude-code-expert, apple-designer, apple-developer, design-qa,
           collectibles-expert
  Opus:    main orchestrator (cost gate — not a quality upgrade)
  ↓
Automation Layer (21 hooks):
  PostToolUse hook      → timestamps every file change to session log
  SessionEnd hook       → auto-commits + pushes to GitHub
  PreCompact hook       → writes checkpoint before context compression
  SubagentStart/Stop    → full agent lifecycle audit trail (zero instrumentation)
  UserPromptSubmit hook → injects clock + pipeline status on every turn
  PreToolUse hooks      → destructive-op guardrails + stuck-loop detection
  pre-commit hook       → 9 secret patterns scanned; read-only enforcement

Key design decisions at the start:

Append-only learnings: old memories preserved as written; git history = version control
Model-tier delegation: Haiku for reads and simple writes, Sonnet for reasoning—reduces per-session token cost significantly
Safety-first: pre-commit blocks secrets before they can reach GitHub; post-commit auto-pushes, so assume everything goes public immediately
Intelligent session routing: context-router (Haiku) classifies each session in <3 seconds—transactional requests skip memory load entirely

The Iteration Record

This is where the portfolio piece lives. The initial build was functional. The current system is correct. What separates them is a documented log of real friction.

Correction Layer (v1 → v1.5)

The initial memory baseline contained factual errors embedded in an immutable file designated as read-only. Rather than edit the source (which would have broken the design intent of the immutable tier), I built a corrections overlay: a file that any agent must load before reading the baseline. This pattern—layered overrides rather than in-place edits—became a design principle.

Write Race Bug (v1.5 → v2.0)

Early multi-agent sessions produced corrupted output files. Investigation traced it to parallel agents writing to the same file simultaneously. Fix: max 1 writer active at any time. A dedicated session-writer agent became the sole agent authorized to write to memory files. All other agents write staging output first, then signal the orchestrator for integration.

Silent Failure Protocol (v2.0 → v2.1)

Background agents occasionally completed with empty output files. Fix: launch logging + completion markers. Before every background task, a LAUNCH entry is logged. Agents write COMPLETE or FAILED markers on finish. After every background task: verify the output file is non-empty before integrating.

QA Gate Protocol (v2.0 → v2.1)

For significant agent outputs, a qa-validator reviews output before returning it. Max 2 retries on NEEDS_REVISION verdict, then escalate. The constraint on retries was deliberate: preventing infinite loops by design, not by hope.

Token Optimization (v2.1 → v2.2)

Initial sessions loaded too much context. Added ignore rules, condensed index files, and session-start protocol to skip memory load for transactional requests. Result: transactional sessions now skip memory load entirely.

Skill Module System (v2.2 → v2.3)

Loading all skills into every session was wasteful. Built a skill module system: skills/_active/ holds currently relevant skills; inactive skills live separately. Activation is explicit—load only what’s needed.

Backlog-as-System (v2.3 → v2.4)

Added a living bug tracker and feature backlog for the system itself. A session-analyst agent reads the backlog weekly and surfaces recurring patterns. The system now has a product management layer for its own development.

Context-Router & Session Classification (v2.4 → v2.5)

Not every session needs full memory. A Haiku classifier reads the first message and returns a memory-load tier in under 3 seconds. Cost optimization through intelligence, not reduction.

Learnings Integrity (v2.5 → v2.6)

As learnings grew, semantic near-duplicates accumulated. Added a dedup-validator that checks every proposed entry against existing entries. Returns UNIQUE, SUPERSEDE, or DUPLICATE at a 60% conceptual overlap threshold.

Decision Engineering & Adversarial Patterns (v2.6 → v2.7)

As the system matured, single-agent outputs became the bottleneck—not in quality, but in blind spots. Built an adversarial debate pattern: Advocate and Critic agents spawn in parallel with opposing briefs, then a Judge synthesizes the strongest arguments into a verdict. Also added an ensemble pattern for career-critical outputs—two independent agents run the same task in parallel, and their divergences surface blind spots neither catches alone.

Also: 8 product management frameworks as interactive commands, and a decision log that prevents re-evaluating the same tools and patterns across sessions.

Autonomous Operations & Platform Expansion (v2.7 → v2.8)

The system was powerful but required me to be present. Built autonomous execution—headless sessions on a schedule with 6-layer safety: kill switch, command allowlist, 10-minute timeout, 30-turn cap per run, git-based write sandbox, and execution log. Two scheduled jobs ran weekly via launchd.

Also: security-auditor agent for macOS defensive security; a repository evaluation pipeline; Safari Bridge for browser interaction via AppleScript; a collectibles data pipeline; and a persistent design system for app building.

Autonomous Deprecation & SessionStart Staleness (v2.8 → v2.25, DEC-017)

The launchd cron approach was deprecated after a 4-week minimum viable autonomy (MVA) trial. Root cause: macOS TCC blocks launchd-spawned bash from reading the iCloud-resident repo (“Operation not permitted”), and StartCalendarInterval silently skips when the Mac is asleep at trigger time. Zero cron-fired runs ever completed—the execution log entries had all been manual terminal invocations. The autonomous channel was off for 16+ days before I noticed, with zero felt pain.

Replaced with SessionStart staleness surfacing: every prompt injects a one-line status (pipeline count, inbox unprocessed, days-since-weekly-review) via a UserPromptSubmit hook. Overdue items appear at the top of every session; I decide whether to act. No headless execution, no TCC battles, no silent failures. The lesson—that a shipped feature can be invisibly broken and the underlying need can evaporate—is itself part of what the system is for.

Also in this stretch: 23 agents total (added voice-qa, privacy-auditor, claude-code-expert, apple-designer, apple-developer, design-qa, collectibles-expert); the full AI Brain writing series (Chapters 0–4) published with per-chapter wayfinding and TTS narration; /build-html and /build-audio sister skills for publishing pipeline automation; and the migration from launchd-driven automation to session-driven awareness.

What Makes It a Learning System

Learning Principle	stonerOS Implementation
Prior knowledge matters	Baseline memory layer loads before each session—Claude enters knowing who I am
Active recall > passive storage	`memory-search` agent retrieves specific memories without loading all files
Spaced repetition via use	Learnings accumulate across sessions; frequently referenced patterns surface naturally
Error correction	Corrections overlay overrides factual errors in baseline
Transfer	Agent delegation routes tasks to the right specialist
Adaptive load	`context-router` adjusts how much memory loads per session
Metacognition	`session-analyst` reviews the system’s own performance weekly

Evidence of Production Use

Git log: Session-end commits timestamped to every working session since February 2026
Agent invocations: 23 agents across 3 model tiers actively routing tasks
21 automation hooks: spanning session lifecycle (clock-anchor, context-injector, end-session, post-compact, pre-compact), tool guardrails (pre-tooluse-guardrail, idle-detection, auto-approve, validate-agent), subagent instrumentation (subagent-start, subagent-stop), and domain automations (board-autoadd, daily-brief, log-changes, parry-scan, rhythm-check, stale-html-check, statusline, giggles-state, giggles-session-end, verify-skill-output)—zero manual intervention
SessionStart staleness surfacing: Overdue pipeline/inbox/review items injected into every prompt (replaced 7 launchd cron jobs retired 2026-04-06 after a 4-week MVA proved TCC blocked them; see DEC-017)
39 slash commands: From /weekly-review to /build-audio—pre-loaded workflows that expand into full operations
11 active skill modules: Covering job search, finance, security, AI research, portfolio management, and more
Correction layer active: Known baseline errors documented and overridden
BUG-002 resolved: Write race fix has held across hundreds of sessions
Token reduction confirmed: transactional sessions skip memory load entirely; light-use days average under $2 vs. deep-work days at $10–$34 (ccusage data, Feb–Mar 2026)
Version history: v1 → v2.25 across 15+ documented iterations

What It Demonstrates

Competency	Evidence
Systems architecture	3-layer memory hierarchy; correction overlay pattern; context-router; adversarial debate and ensemble patterns
AI/ML engineering	23-agent orchestration; model-tier delegation; dedup-validator for learnings integrity
Automation & CI/CD	21 hooks; pre-commit secret scanning; auto-push pipeline; SessionStart staleness surfacing (replaced deprecated launchd cron—DEC-017)
Iterative engineering	v1 → v2.25 across 15+ iterations—each driven by real production friction including deprecated autonomous execution
Product thinking	39 slash commands, 11 active skill modules, session classification, QA gates, decision log
Learning systems design	Architecture mirrors instructional design principles
Self-direction	No spec, no team, no deadline—continuously iterated across 15+ versions, in daily production

Conversations That Count ClaudeUsage — Menu Bar Tracker