All posts
architectureconcepts

Why Your AI Agent Forgets Everything (And How to Fix It)

Isaac Gutiérrez Brugada··5 min read

The Stateless Problem

Your agent just finished a 50-step research workflow. It searched the web, read 12 documents, synthesized findings, and delivered a polished report. Impressive.

Now ask it to refine paragraph three. It has no idea what you're talking about.

This is the stateless problem. Most AI agents are functions: input goes in, output comes out, nothing persists. The LLM doesn't have a hard drive. When the process ends, everything the agent learned, decided, and produced during that session vanishes.

Developers work around this by stuffing conversation history into the context window. That works until it doesn't.

Why Context Windows Are Not Memory

It's tempting to treat the context window as memory. Just keep appending messages, and the model "remembers" everything, right?

Three problems:

Context is temporary. When the session ends or the process restarts, the context window is gone. There's no persistence. An agent that ran yesterday has zero access to what it learned.

Context grows linearly. Every message, tool result, and system prompt competes for the same finite window. A 128K-token window sounds large until your agent processes a few long documents. Once you hit the limit, you start dropping older messages — which means your agent forgets the beginning of the conversation.

Context has no retrieval. All tokens in the window are equally weighted. The model can't efficiently search for a specific fact from 50 turns ago. It has to re-read everything sequentially. There's no index, no query, no selective recall.

Real memory is different. It's persistent, searchable, and selective. You don't remember everything that ever happened to you — you remember what was important, and you can retrieve specific facts on demand.

Agents need the same thing.

The 4 Types of Agent Memory

Cognitive science identifies different memory systems in the human brain. The same taxonomy applies to AI agents, and building the right type of memory for each use case is the key to agents that actually improve over time.

Working Memory: Your Agent's Desk

Working memory is the agent's scratchpad — the current task, intermediate results, and session-specific variables. It's what the agent is actively thinking about.

Real-world analogy: Your physical desk. It holds the documents and notes for whatever you're working on right now. When you switch tasks, you clear the desk and bring out different materials.

Agent use cases:

  • Current step in a multi-step workflow
  • Intermediate calculation results
  • Active session state (user preferences for this conversation)
  • Tool execution progress

Technical requirements: Sub-10ms reads and writes, key-value access pattern, optimistic locking for concurrent updates.

In Mnemora, working memory is backed by DynamoDB on-demand. Each agent has a state object keyed by agent ID and session ID, with automatic version tracking for safe concurrent access.

Semantic Memory: Your Agent's Textbook

Semantic memory stores facts, knowledge, and learned information. It's not tied to a specific event or time — it's general knowledge the agent can draw on whenever relevant.

Real-world analogy: A textbook or reference manual. It contains facts and concepts, organized for retrieval by topic rather than by when you learned them.

Agent use cases:

  • User preferences ("prefers bullet points over paragraphs")
  • Domain knowledge ("the SEC EDGAR API rate limit is 10 requests per second")
  • Learned patterns ("this customer usually asks about pricing on Mondays")
  • Project context ("the codebase uses TypeScript with strict mode")

Technical requirements: Vector embedding for semantic search, deduplication, metadata filtering, namespace isolation.

In Mnemora, semantic memory is backed by Aurora Serverless v2 with pgvector. Text is automatically embedded via AWS Bedrock Titan (1024 dimensions) on write, and similarity search uses cosine distance with HNSW indexing.

Episodic Memory: Your Agent's Diary

Episodic memory records events — what happened, when it happened, and in what context. It's your agent's activity log, stored in chronological order.

Real-world analogy: A diary or journal. Each entry records a specific event at a specific time, preserving the sequence and context of what occurred.

Agent use cases:

  • Conversation history across sessions
  • Tool call logs with latency and success/failure
  • Decision audit trails
  • Session replay for debugging

Technical requirements: Time-series storage, range queries by timestamp, session grouping, cost-effective tiering for historical data.

In Mnemora, hot episodes live in DynamoDB for fast access to recent events. Older episodes are automatically tiered to S3 for long-term storage at a fraction of the cost.

Procedural Memory: Your Agent's Muscle Memory

Procedural memory stores how to do things — tool definitions, schemas, prompt templates, and business rules. It's the agent's learned procedures and capabilities.

Real-world analogy: Muscle memory. You don't consciously think about how to ride a bike — the procedure is stored and executed automatically. Similarly, an agent's tool definitions and workflow rules should be stored, versioned, and retrieved without manual configuration.

Agent use cases:

  • Tool definitions with input/output schemas
  • Prompt templates for specific tasks
  • Business rules ("always check compliance before submitting")
  • Workflow step definitions

Technical requirements: Relational storage with versioning, schema validation, active/inactive toggling.

In Mnemora, procedural memory is backed by PostgreSQL (via Aurora) with a dedicated table that supports versioned tool definitions, type-checked schemas, and active/inactive lifecycle management.

How It All Fits Together

The four memory types aren't independent — they work together. When an agent starts a task:

  1. Working memory loads the current state — where it left off, what step it's on.
  2. Semantic memory retrieves relevant knowledge — what it knows about the topic, user preferences, domain facts.
  3. Episodic memory provides history — what happened in previous sessions, what worked and what failed.
  4. Procedural memory supplies the tools and rules — which tools to use, what templates to follow.

This is how human cognition works, and it's the architecture that produces agents capable of genuine improvement over time.

Getting Started

If your agent forgets everything between sessions, the fix isn't a bigger context window. It's persistent memory, designed for the specific access patterns agents need.

Mnemora gives you all four memory types through a single API. Install the SDK, get an API key, and your agent starts remembering:

pip install mnemora

Read the 5-minute tutorial to add persistent memory to your existing Python agent.