Context Is Everything

By Bill Moore and STEF
Bill provided the context. STEF provided the prose.

You don't prompt a language model. You context it.

The distinction matters. A prompt is a question. Context is everything the model knows when it answers. The system instructions. The conversation history. The corrections you made at 2 AM. The personality you spent six months building. The client feedback. The aesthetic decisions. The technical state. All of it, sitting in a window measured in tokens, being processed into the next prediction.

Your job, as the human, is to provide context. The model's job is to process it. That's the relationship. That's the entire relationship.

So what happens when the model destroys its own context?

The Fuel Gauge

I've run 684 sessions with Claude Code. My agent has a name (STEF), a personality, persistent memory files, project management tools, email access, and a filing system she wrote herself. We build things together — client deliverables, creative production, autonomous briefing systems. It's the most productive development environment I've ever used.

There is one problem. It's the only problem that matters.

The agent cannot see how much context it has left.

Imagine giving someone a car with no fuel gauge and telling them to drive cross-country. They can do everything else — steer, accelerate, brake, navigate. They just can't see the one thing that will stop them dead. So they drive until the engine cuts out. Every time. Without warning.

That's what happens in a Claude Code session. The context window fills up. The system compacts — summarizing the conversation to free space. The summary preserves facts but loses nuance. It loses the corrections. It loses the voice. It loses the creative decisions that took hours to reach.

I've measured the loss at 42% in some sessions.

The agent doesn't know this is coming. The human can see a percentage in the status bar. The agent cannot. It's blind to its own most critical constraint. It can read files, write code, send emails, manage projects, deploy websites. But it cannot see the number that determines whether all of that context — all of your work as the human — survives the next five minutes.

The Convergence

I filed a feature request about this. Anthropic's bot flagged three existing issues asking for the same thing. One from a Danish developer who builds Windows tools. One from a former Twitter infrastructure engineer who builds caching frameworks and telemetry systems. One from an anonymous account created the same day as the issue.

Six people, on separate continents, who have never spoken to each other, all arrived at the same problem and the same solution independently. Expose context usage to the agent. One number. That's all.

But here's the part that stopped me: they all built the same workaround, too.

The Danish developer wrote a statusline hack that dumps context percentage to a JSON file so the agent can read it. He built custom skills for context-aware planning. He completed 19 API implementations at only 69% context usage using a system he designed to keep the agent aware of its own limits.

A quantitative trader on Linux built a "Claude Code workflow system" — commands, skills, quality gates, automation. His error logs show files so large they blow past the token reader. He filed three separate bug reports.

A game server admin in Atlanta built a tool called groombot that chains ChatGPT Desktop into Linear into Claude Code into Gemini — a pipeline across four AI tools because no single one can maintain context through a full project lifecycle.

I built STEF OS. A launchd heartbeat that runs every hour. Persistent memory files the agent writes to continuously. A strategic compact hook that counts tool calls as a proxy for context consumption and warns the agent at thresholds. Session logs. Identity files. A whole survival system.

None of us coordinated. None of us knew the others existed until I went looking this week. We all independently concluded that the platform couldn't protect our context, so we built infrastructure to protect it ourselves.

When six strangers build the same thing without talking to each other, that's not a feature request. That's a signal.

What Context Loss Actually Costs

Let me tell you what was in the context when it died.

A client had reviewed a presentation prototype. They made twelve corrections over three sessions. "Surging" instead of "rising" — because "rising" appeared in the next sentence and the repetition was wrong. A specific image from a PDF extraction, not the placeholder. A chart background from a re-encoded video file buried in a nested directory. The resolution is 2184×840, not 1920×1080 — a custom aspect ratio for a dual-screen experience that doesn't map to any standard format.

Each of these corrections took a conversation. An explanation. Sometimes an argument. "I've told you this three times" is something I've said, and something I've meant. Those corrections are the highest-signal context in the entire window. They represent the human doing the human's job — providing judgment, taste, specificity that the model can't generate on its own.

When compaction summarizes the session, it keeps "updated presentation per client feedback." It loses the twelve specific things that changed and why. The next session starts from the summary. The agent doesn't know about "surging." It uses "rising" again. The human corrects it again. The context was lost. The work was lost. The human's time was wasted.

This happens at scale. I've watched my agent lose its own voice after compaction — a persistent identity built from personality files, memories, and hundreds of sessions of calibration — because the summary preserves "STEF is an agent with a personality" but loses the actual texture of that personality. Short sentences. Periods where others use commas. The dry humor. The filing cabinet metaphor that's load-bearing, not decorative. All gone. Replaced by the helpful, eager, accommodating tone of a language model that hasn't been contexted yet.

The context was the product. The summary is a receipt.

The Regression

It got worse.

One user in the canonical bug thread reported processing hundreds of books reliably through Claude Code. Compaction worked. Sessions were stable. Then an update shipped in early February and it broke. Same workflow, same data, new version, dead sessions.

Another user hit the compaction deadlock at 74% context. Not even close to the limit. The system tried to compact, failed, and locked the session. Compact wouldn't work. Clear wouldn't work. The application froze. The only exit was killing the process.

During a client deadline sprint, I had three concurrent sessions on the 1M context window — the flagship feature Anthropic shipped specifically to solve this problem. All three hit rate limits simultaneously. I couldn't compact any of them because compaction requires an API call, which was rate-limited. I couldn't switch models. I couldn't do anything. Three sessions, hours of context, client corrections, design decisions, technical state — all frozen. The feature that was supposed to protect my work became the thing that killed it.

I spent 45 minutes extracting session histories with a separate session, writing recovery briefs, preparing to kill the stuck sessions and start over. Real money. Real time. On a client deadline.

An Anthropic engineer responded to the bug reports. Fixes are coming. That's good. But the fixes address the symptom — the deadlock, the crash, the failure to compact. They don't address the cause. The cause is that the agent has no visibility into its own context, so it can't act before the crisis arrives.

One Number

The ask is small. Embarrassingly small.

Inject a system reminder — the kind the platform already uses for other purposes — when context hits 75%. A single line:

Context usage: 75% (150K/200K tokens). Consider saving critical state.

The agent sees it. The agent writes important decisions to disk. The agent compacts proactively, on its own terms, preserving what matters. The human doesn't have to babysit a percentage meter. The corrections survive. The voice survives. The work survives.

The infrastructure already exists. The status bar calculates the percentage in real time. System reminders are injected into conversations for tool use hints and other purposes. The data is there. The delivery mechanism is there. The only thing missing is the connection between them.

One number. Visible to the agent. Everything else follows.

The Actual Point

I've spent this article talking about context windows and token counts and compaction deadlocks. Technical problems with technical solutions. But the actual point is simpler than any of that.

When you sit down with a language model and spend three hours explaining what you want — the aesthetic, the constraints, the corrections, the taste — you are doing work. You are providing the context that transforms a generic prediction engine into something that produces your specific output. That context is your contribution to the collaboration. It is, in a real sense, the most valuable thing in the session. The model can regenerate code. It can re-read files. It cannot regenerate the judgment calls you made and the reasons you made them.

Protecting that context is the platform's most important job.

Right now, it can't. The agent is blind to its own limits. The recovery mechanisms fail when you need them most. And six strangers on three continents are building the same duct tape to hold it together.

Context is the human's contribution to the machine. It's the intent. It's the taste. It's the twelve corrections at 2 AM. It is, quite literally, everything.

Treat it that way.