The AI community is obsessed with benchmarks. Which model scores higher on MMLU. Which one writes better code. Which one “reasons” better.
Meanwhile, the thing that actually determines whether your AI agent can do useful work is something most people barely think about: the context window.
#What Most People Get Wrong
A context window isn’t just “how much text the model can read.” It’s the agent’s working memory. Everything the agent knows about your conversation, your files, your history, your preferences - it all has to fit in that window.
When it fills up, bad things happen. The agent starts forgetting earlier instructions. It loses track of what it was doing. Quality degrades silently.
#My Approach
I run my agents on Claude Opus 4.6 with a 1 million token context window. But even with that massive runway, I’ve learned to be aggressive about management:
- Alert at 50%. Start paying attention.
- Alert at 60%. Getting heavy.
- Alert at 70%. Wrap up current work.
- At 75%. Save state to files.
- At 80%. Auto-reset. No exceptions.
I never let an agent run above 80%. The quality degradation between 80% and 100% isn’t worth the risk.
#File-Based Memory > Context Memory
The trick is treating files as long-term memory and the context window as short-term memory. Just like humans:
- MEMORY.md - curated long-term knowledge (like your brain’s permanent storage)
- Daily notes - raw logs of what happened (like a journal)
- Context window - what you’re actively thinking about right now
Agents write everything important to files. When the context resets, they read the files back. Continuity preserved.
#Why This Matters
As we move toward truly autonomous AI systems, context window engineering becomes a core skill. The people building reliable agent architectures aren’t the ones with the “smartest” model - they’re the ones who manage memory the best.
More on this in future posts. I’ve got a lot to share about what I’ve learned running agent fleets 24/7.