Context Engineering: Why Agent Memory Is the Real AI Skill of 2026

An AI agent can fix a bug in 30 seconds. Ask it to continue the same project tomorrow — it’s forgotten everything. 50,000 developers just starred claude-mem, a plugin that gives Claude Code persistent memory. At the same time, Andrej Karpathy’s CLAUDE.md file has crossed 13,000 stars on GitHub. And a paper accepted at ICML 2025, HippoRAG 2, proves that LLMs can be given memory that rivals our own.

The message is unmistakable: the bottleneck for AI agents is no longer intelligence. It’s memory.

Welcome to the era of context engineering — the skill that will separate people who use AI from people who actually master it.

What Is Context Engineering (and Why It’s Replacing Prompt Engineering)

The term was popularized by Tobi Lutke, Shopify’s CEO, then amplified by Andrej Karpathy in June 2025: context engineering is the art of providing an LLM with exactly the right context to accomplish its task. Not a magic prompt — a complete set of information.

The distinction matters. Prompt engineering is writing a good instruction. Context engineering is orchestrating everything that enters the context window: the instruction, yes, but also relevant data, session history, available tools, examples, and project memory.

“People associate prompt engineering with typing things into a chatbot. Context engineering better describes the real skill: the art of providing all the necessary context so that the task is plausibly solvable by the LLM.” — Tobi Lutke, CEO of Shopify

Why is this becoming critical in 2026? Because AI agents aren’t chatbots anymore. Claude Code, Codex, Hermes Agent — these tools chain dozens of actions across your code, files, and APIs. Every action consumes context. And when the window fills up, performance degrades: the agent forgets its initial instructions, makes mistakes, and spins in circles.

According to Anthropic’s official best practices for Claude Code: “The context window is the most important resource to manage. LLM performance degrades as it fills up.”

Context engineering isn’t a theoretical concept. It’s a concrete engineering problem. And in 2026, three tiers of solutions are emerging.

Tier 1 — Static Context: CLAUDE.md, the File That Makes Agents Smarter

The first tier of context engineering is the simplest: give the agent the right information from the start.

That’s exactly what the CLAUDE.md file does. Placed at the root of a project, this Markdown file is automatically read by Claude Code at the start of every session. It contains project conventions, architecture details, known pitfalls, and patterns to follow.

Andrej Karpathy identified four recurring problems with LLMs writing code:

Problem	CLAUDE.md Principle
Silent assumptions — the agent guesses instead of asking	Think Before Coding — make assumptions explicit
Over-engineering — 1,000 lines when 100 would do	Simplicity First — the minimum that solves the problem
Collateral modifications — touching unrelated code	Surgical Changes — only modify what’s necessary
No verification — coding without success criteria	Goal-Driven — define tests before writing code

His andrej-karpathy-skills repo (13,300 stars as of April 2026) proves that the quality of initial context radically changes agent behavior. The same model, with the same capabilities, produces significantly better code when given the right instructions.

How to Use It in Practice

Create a CLAUDE.md file at the root of your project:

# Project Conventions
- Framework: Next.js 15 + strict TypeScript
- Styling: Tailwind CSS, no custom CSS
- Tests: Vitest, every new function needs a test
- Never modify files in /lib/core without asking

# Architecture
- /src/app — pages and routes
- /src/components — reusable components
- /src/lib — business logic

# Known Pitfalls
- The payments API uses async webhooks — always verify status
- Build fails if types aren't strict

This is static context: it doesn’t change between sessions. But the impact is immediate. It’s the foundation for everything that follows.

Tier 2 — Dynamic Context: claude-mem Gives Agents Persistent Memory

Static context has a limit: it describes the project as designed, not as it’s evolved. What the agent did yesterday — bugs fixed, decisions made, files changed — vanishes when the session closes.

That’s the problem claude-mem solves, the plugin that just exploded to 50,000 stars on GitHub. Its principle: automatically capture what Claude does during a session, compress those observations with AI, and reinject them into the next session.

How It Works Under the Hood

The system relies on 5 lifecycle hooks that activate at key moments:

SessionStart — injects context from previous sessions
UserPromptSubmit — captures every user request
PostToolUse — records every tool used and its result
Stop — marks pauses
SessionEnd — compresses and stores the session summary

Observations are stored in a SQLite database with full-text search (FTS5) and, optionally, a ChromaDB vector database for semantic search.

The Key: Progressive Disclosure

The real innovation in claude-mem isn’t storing memory — it’s not loading everything at once. The concept is called progressive disclosure, and it’s inspired by how humans work: you scan headlines before reading an article.

The classic RAG (Retrieval-Augmented Generation) approach would load 35,000 tokens of context at startup. Problem: only about 6% would be relevant. The rest pollutes the context window and degrades performance.

Claude-mem does the opposite:

Approach	Tokens loaded	Relevant
Classic RAG	~35,000	~6%
Progressive disclosure	~800 (index) + on-demand loading	~100%

The agent first sees a compact index (titles, dates, types, token cost). It then decides what to load based on its current task. Result: every token in the window has a reason to be there.

One-Command Installation

npx claude-mem install

That’s it. After restarting Claude Code, context from previous sessions appears automatically. The plugin also works with Gemini CLI (npx claude-mem install --ide gemini-cli).

Tier 3 — Learned Context: When Research Catches Up to the Need

Tiers 1 and 2 solve memory at the tool level. But academic research is working on a deeper problem: giving LLMs themselves a long-term memory that works like ours.

HippoRAG 2, published by the NLP team at Ohio State University and accepted at ICML 2025, proposes exactly that. The paper is titled “From RAG to Memory: Non-Parametric Continual Learning for Large Language Models” — and the title captures the ambition well.

The Problem with Standard RAG

Classic RAG works through vector similarity: you ask a question, the system finds the closest passages in an embedding database. It’s effective for simple factual queries, but fails on two types of tasks:

Sense-making — connecting scattered information to extract a global meaning
Association — retrieving a memory through an indirect path, the way humans naturally do

The Solution: Mimicking the Hippocampus

HippoRAG 2 is inspired by how our hippocampus organizes memory. Instead of simple vectors, it builds a knowledge graph where passages are deeply integrated and interconnected through an enhanced Personalized PageRank algorithm.

The results speak for themselves: +7% performance on associative memory tasks compared to the best embedding model, while maintaining superiority on factual and sense-making tasks.

This isn’t a plugin you install with one command yet. But it’s the direction: memory systems that don’t just store and retrieve, but organize and connect knowledge the way a human brain does.

From Amnesic Agents to Teammates: How Context Engineering Changes the Game

Let’s put this in perspective. By combining these three tiers, we go from an agent that forgets everything between sessions to a genuine teammate:

Tier	What the agent knows	Tool	Setup effort
0 — No context	Nothing. Rediscovers the project every session	—	—
1 — Static	Conventions, architecture, pitfalls	CLAUDE.md	10 min (write the file)
2 — Dynamic	What it did in previous sessions	claude-mem	1 min (`npx claude-mem install`)
3 — Learned	Deep connections between knowledge	HippoRAG 2 (research)	In progress

The concrete impact? Anthropic documents it in their best practices: “Claude performs radically better when it can verify its own work” — and having context from previous sessions is exactly that. The agent knows what was tried, what failed, and what decisions were made.

For developers, it’s the end of the dreaded “re-briefing”: that moment where you have to re-explain your project to Claude because it’s forgotten the last 4 sessions. With a well-written CLAUDE.md and claude-mem installed, you type claude in your terminal and the agent already knows where things stand.

For teams, platforms like Multica (10,000 stars, growing fast) go further: they orchestrate multiple agents like teammates on a Kanban board. Each agent has a profile, picks up tasks, reports blockers — and its skills compound over time through a reusable skills system.

The GitHub Trending ecosystem this week confirms it: Hermes Agent by Nous Research (71,000 stars) integrates a native learning loop, Archon by Coleam00 (17,000 stars) makes AI coding deterministic and reproducible. Agent memory is no longer a nice-to-have — it’s the standard.

How to Get Started in 5 Minutes

Want to jump from tier 0 to tier 2 right now? Here’s the path:

Step 1 — Create Your CLAUDE.md (2 minutes)

At the root of your project, create a CLAUDE.md file with:

Your tech stack (framework, language, versions)
Code conventions (naming, structure, tests)
Known pitfalls (recurring bugs, flaky APIs)
Project architecture (key directories, responsibilities)

Step 2 — Install claude-mem (1 minute)

npx claude-mem install

Restart Claude Code. Done.

Step 3 — Verify It Works (2 minutes)

Launch a Claude Code session, work normally, close the session. Reopen a new session — you should see an index of your previous observations appear automatically.

The MCP (Model Context Protocol) we covered last week plays a role here too: it standardizes how agents access tools and external data. Context engineering and MCP are two sides of the same coin.

Frequently Asked Questions About Context Engineering

Does context engineering replace prompt engineering?

Not exactly — it encompasses it. Prompt engineering remains important for writing good instructions. Context engineering adds everything surrounding the prompt: memory, data, tools, history. As Simon Willison puts it: “the intuitive definition of context engineering is probably much closer to its intended meaning than that of prompt engineering.”

Does claude-mem work with models other than Claude?

Yes. In its latest versions, claude-mem also supports Gemini CLI. The persistent memory principle is model-agnostic — it’s an infrastructure layer on top of the agent, not a model feature.

Is HippoRAG 2 production-ready?

Not directly, not yet. It’s a research framework with open-source code available on GitHub. But its principles (knowledge graph + personalized PageRank for memory retrieval) are already influencing production tools. Expect to see these techniques integrated into memory plugins by the end of 2026.

Key takeaways:

Context engineering is the critical AI skill of 2026 — not just writing a good prompt, but orchestrating all the context an agent needs to succeed
Three tiers already exist — static (CLAUDE.md), dynamic (claude-mem), learned (HippoRAG 2) — and the first two can be set up in 5 minutes
Amnesic agents are over — the ecosystem (50k stars claude-mem, 71k stars Hermes Agent, 10k stars Multica) proves that persistent memory has become the expected standard
Start now — a CLAUDE.md + npx claude-mem install = your agent remembers everything, starting tomorrow morning