Back to Portfolio
stonerOS — Personal AI Memory System Production AI System | Self-Initiated | February 2026–Present
AI systems architecture Multi-agent orchestration Automation & CI/CD Iterative design Self-directed build

Listen

About This Project

Every Claude session starts from zero. No memory of prior conversations, no knowledge of preferences, no continuity across sessions. For a daily AI user doing serious work—job search strategy, resume iteration, research synthesis—that’s a meaningful friction point.

I built stonerOS: a production AI memory system that gives Claude persistent, evolving knowledge about me across every session. It’s a git repository—3-layer memory architecture, 22 specialized agents across three model tiers, 17 automation hooks, pre-commit secret scanning, auto-push pipeline, and SessionStart staleness surfacing. I use it every day. No one asked me to build it.

What makes it a portfolio piece is not that it was built—it’s how it evolved. The system at version 2.25 looks materially different from v1.0. Real use generated real friction. Real friction generated real fixes. The system learns about me as I use it. I also got better at building systems by running one.

This demonstrates: AI systems architecture, multi-agent orchestration, automation pipeline design, iterative engineering judgment, and the kind of self-directed builder instinct that doesn’t wait for a formal request to solve a real problem.

Context

RoleSelf-initiated—designer, architect, engineer, user
StartFebruary 2026 (built post-layoff, October 2025)
StatusActive production use—daily
StackClaude API, Claude Code hooks, bash, git, Python
RepoPrivate GitHub
Version2.25 as of April 2026

The Problem

Large language models have no persistent memory by default. Every new conversation is a blank slate. For casual use, that’s fine. For someone running active job search operations, refining career strategy, managing a personal knowledge base, and iterating on complex documents over weeks—it’s a significant gap.

Workarounds exist but each breaks down:

The real problem wasn’t technical—it was that there was no designed system for how AI memory should work for a single user doing complex, evolving work.

The Architecture

A three-layer memory hierarchy with agent delegation and automated CI/CD.

CLAUDE.md (auto-read every session — master instruction set)
  ↓
Memory Layer (3 tiers, append-only):
  memories/       immutable baseline (professional history, personality exports)
  learnings/      append-only (patterns, expertise, communication style, corrections)
  preferences/    current state (tools, active projects, workflow context)
  ↓
Session Router:
  context-router (Haiku) → classifies session in <3 sec
    transactional → skip memory load
    partial       → preferences + corrections only
    deep-work     → full load + skill suggestions
  ↓
Agent Layer (22 agents, 3 model tiers):
  Haiku:   memory-search, session-writer, qa-validator, dedup-validator,
           context-router, skill-curator, session-analyst, voice-qa
  Sonnet:  archivist, talent, code-architect, research-analyst, docs-sync,
           repo-janitor, finance-analyst, security-auditor, privacy-auditor,
           claude-code-expert, apple-designer, apple-developer, design-qa
  Opus:    main orchestrator (cost gate — not a quality upgrade)
  ↓
Automation Layer (17 hooks):
  PostToolUse hook      → timestamps every file change to session log
  SessionEnd hook       → auto-commits + pushes to GitHub
  PreCompact hook       → writes checkpoint before context compression
  SubagentStart/Stop    → full agent lifecycle audit trail (zero instrumentation)
  UserPromptSubmit hook → injects clock + pipeline status on every turn
  PreToolUse hooks      → destructive-op guardrails + stuck-loop detection
  pre-commit hook       → 9 secret patterns scanned; read-only enforcement

Key design decisions at the start:

The Iteration Record

This is where the portfolio piece lives. The initial build was functional. The current system is correct. What separates them is a documented log of real friction.

Correction Layer (v1 → v1.5)

The initial memory baseline contained factual errors embedded in an immutable file designated as read-only. Rather than edit the source (which would have broken the design intent of the immutable tier), I built a corrections overlay: a file that any agent must load before reading the baseline. This pattern—layered overrides rather than in-place edits—became a design principle.

Write Race Bug (v1.5 → v2.0)

Early multi-agent sessions produced corrupted output files. Investigation traced it to parallel agents writing to the same file simultaneously. Fix: max 1 writer active at any time. A dedicated session-writer agent became the sole agent authorized to write to memory files. All other agents write staging output first, then signal the orchestrator for integration.

Silent Failure Protocol (v2.0 → v2.1)

Background agents occasionally completed with empty output files. Fix: launch logging + completion markers. Before every background task, a LAUNCH entry is logged. Agents write COMPLETE or FAILED markers on finish. After every background task: verify the output file is non-empty before integrating.

QA Gate Protocol (v2.0 → v2.1)

For significant agent outputs, a qa-validator reviews output before returning it. Max 2 retries on NEEDS_REVISION verdict, then escalate. The constraint on retries was deliberate: preventing infinite loops by design, not by hope.

Token Optimization (v2.1 → v2.2)

Initial sessions loaded too much context. Added ignore rules, condensed index files, and session-start protocol to skip memory load for transactional requests. Result: transactional sessions now skip memory load entirely.

Skill Module System (v2.2 → v2.3)

Loading all skills into every session was wasteful. Built a skill module system: skills/_active/ holds currently relevant skills; inactive skills live separately. Activation is explicit—load only what’s needed.

Backlog-as-System (v2.3 → v2.4)

Added a living bug tracker and feature backlog for the system itself. A session-analyst agent reads the backlog weekly and surfaces recurring patterns. The system now has a product management layer for its own development.

Context-Router & Session Classification (v2.4 → v2.5)

Not every session needs full memory. A Haiku classifier reads the first message and returns a memory-load tier in under 3 seconds. Cost optimization through intelligence, not reduction.

Learnings Integrity (v2.5 → v2.6)

As learnings grew, semantic near-duplicates accumulated. Added a dedup-validator that checks every proposed entry against existing entries. Returns UNIQUE, SUPERSEDE, or DUPLICATE at a 60% conceptual overlap threshold.

Decision Engineering & Adversarial Patterns (v2.6 → v2.7)

As the system matured, single-agent outputs became the bottleneck—not in quality, but in blind spots. Built an adversarial debate pattern: Advocate and Critic agents spawn in parallel with opposing briefs, then a Judge synthesizes the strongest arguments into a verdict. Also added an ensemble pattern for career-critical outputs—two independent agents run the same task in parallel, and their divergences surface blind spots neither catches alone.

Also: 8 product management frameworks as interactive commands, and a decision log that prevents re-evaluating the same tools and patterns across sessions.

Autonomous Operations & Platform Expansion (v2.7 → v2.8)

The system was powerful but required me to be present. Built autonomous execution—headless sessions on a schedule with 6-layer safety: kill switch, command allowlist, 10-minute timeout, 30-turn cap per run, git-based write sandbox, and execution log. Two scheduled jobs ran weekly via launchd.

Also: security-auditor agent for macOS defensive security; a repository evaluation pipeline; Safari Bridge for browser interaction via AppleScript; a collectibles data pipeline; and a persistent design system for app building.

Autonomous Deprecation & SessionStart Staleness (v2.8 → v2.25, DEC-017)

The launchd cron approach was deprecated after a 4-week minimum viable autonomy (MVA) trial. Root cause: macOS TCC blocks launchd-spawned bash from reading the iCloud-resident repo (“Operation not permitted”), and StartCalendarInterval silently skips when the Mac is asleep at trigger time. Zero cron-fired runs ever completed—the execution log entries had all been manual terminal invocations. The autonomous channel was off for 16+ days before I noticed, with zero felt pain.

Replaced with SessionStart staleness surfacing: every prompt injects a one-line status (pipeline count, inbox unprocessed, days-since-weekly-review) via a UserPromptSubmit hook. Overdue items appear at the top of every session; I decide whether to act. No headless execution, no TCC battles, no silent failures. The lesson—that a shipped feature can be invisibly broken and the underlying need can evaporate—is itself part of what the system is for.

Also in this stretch: 22 agents total (added voice-qa, privacy-auditor, claude-code-expert, apple-designer, apple-developer, design-qa); the full AI Brain writing series (Chapters 0–4) published with per-chapter wayfinding and TTS narration; /build-html and /build-audio sister skills for publishing pipeline automation; and the migration from launchd-driven automation to session-driven awareness.

What Makes It a Learning System

Learning PrinciplestonerOS Implementation
Prior knowledge mattersBaseline memory layer loads before each session—Claude enters knowing who I am
Active recall > passive storagememory-search agent retrieves specific memories without loading all files
Spaced repetition via useLearnings accumulate across sessions; frequently referenced patterns surface naturally
Error correctionCorrections overlay overrides factual errors in baseline
TransferAgent delegation routes tasks to the right specialist
Adaptive loadcontext-router adjusts how much memory loads per session
Metacognitionsession-analyst reviews the system’s own performance weekly

Evidence of Production Use


What It Demonstrates

CompetencyEvidence
Systems architecture3-layer memory hierarchy; correction overlay pattern; context-router; adversarial debate and ensemble patterns
AI/ML engineering22-agent orchestration; model-tier delegation; dedup-validator for learnings integrity
Automation & CI/CD17 hooks; pre-commit secret scanning; auto-push pipeline; SessionStart staleness surfacing (replaced deprecated launchd cron—DEC-017)
Iterative engineeringv1 → v2.25 across 15+ iterations—each driven by real production friction including deprecated autonomous execution
Product thinking34 slash commands, 11 active skill modules, session classification, QA gates, decision log
Learning systems designArchitecture mirrors instructional design principles
Self-directionNo spec, no team, no deadline—continuously iterated across 15+ versions, in daily production