The Memory Problem
An AI agent without persistent memory is like a person with severe amnesia: capable in the moment but unable to learn from experience, maintain context across sessions, or build up the accumulated knowledge that expertise requires. Memory architecture is one of the most important design decisions in building capable agents.
Memory in AI agents maps loosely onto memory psychology: short-term (working memory), long-term (semantic and declarative), and episodic (autobiographical). Each plays a different role in agent cognition, and effective agents typically use all three.
Short-Term Memory: The Context Window
Short-term memory is simply the LLM's context window. Everything currently in context is "remembered" with perfect fidelity. The conversation history, retrieved documents, current task state, tool outputs — all live in context.
Short-term memory has two major limitations: it's bounded (large context windows cost money; at some point the window fills) and it's transient (it disappears when the session ends). For single-session tasks on small context sets, context-as-memory is sufficient. For long-running agents or multi-session applications, you need more.
Long-Term Memory: External Storage
Long-term memory is information stored outside the context window that can be retrieved on demand. The primary implementations:
- Vector stores: Embed and store text snippets; retrieve by semantic similarity. Good for unstructured information: notes, facts, conversation summaries. Popular stores: Pinecone, Weaviate, Chroma, pgvector.
- Structured databases: SQL or NoSQL databases for structured information. Better for facts with known schema: user preferences, task status, configuration.
- Key-value stores: Redis or similar for fast retrieval of specific items. Good for cache-like patterns: recent user actions, frequently accessed facts.
The retrieval mechanism is as important as the storage mechanism. A common pattern: store everything, retrieve the top-K most relevant items based on embedding similarity to the current query, then inject them into context.
Episodic Memory: Learning from Experience
Episodic memory stores records of past experiences — what happened, when, and with what outcome. In agents, this typically means recording past task attempts, successes, and failures, along with what approaches were tried.
Well-implemented episodic memory allows agents to improve over time: "Last time I tried to book a flight using search_flights with vague destination, it failed. This time I'll use a more specific query." This is a primitive form of learning-from-experience that doesn't require model fine-tuning.
The MemGPT paper demonstrated episodic memory architectures for very long conversations; the "memory palaces" approach in Claude's extended thinking mode is a related idea.
Memory Management: What to Store, What to Forget
A common mistake is storing everything and retrieving nothing useful. Effective memory management requires:
- Selective storage: Not every piece of information deserves to be stored. Store facts, preferences, outcomes, and important context; don't store intermediate computation or temporary state.
- Memory consolidation: Summarize and compress long conversations before storing. A 100-message conversation can often be compressed to 500 words without losing important context.
- Memory pruning: Old information becomes stale. Implement TTL (time-to-live) for time-sensitive information and prioritization schemes for relevant vs. irrelevant history.
- Relevance filtering: During retrieval, a second LLM call can filter retrieved memories for actual relevance to the current query — expensive but improves precision.
Memory in the Claude Agent SDK
Claude's native context management handles short-term memory automatically. For long-term memory, the Claude Agent SDK supports integration with any external storage system. The recommended pattern: a memory module that intercepts each turn, stores relevant information, and retrieves relevant memories at the start of each turn. The AGT-101 course at Meridian AI covers this pattern in detail.