All posts
tutorialpythongetting-started

How to Give Your AI Agent Persistent Memory in 5 Minutes

Isaac Gutiérrez Brugada··6 min read

The Problem: Agents Are Stateless

Every time your AI agent runs, it starts from scratch. It doesn't know what it did five minutes ago, what the user told it yesterday, or which tools worked best last time. The context window gives the illusion of memory, but it's temporary — once the session ends, everything evaporates.

This is a fundamental problem. Agents that can't remember can't improve. They repeat mistakes, ask the same clarifying questions, and lose track of multi-step workflows the moment they restart.

You need actual persistent memory. Not a bigger context window. Not a longer system prompt. A real database that stores and retrieves agent state, knowledge, and history across sessions.

Here's how to add that to any Python agent in under 5 minutes.

Step 1: Install the SDK

pip install mnemora

That's it. The SDK ships with both async and sync clients. We'll use the sync client here for simplicity.

Step 2: Store Semantic Memory

Semantic memory is your agent's knowledge base — facts, preferences, and learned information that persist across sessions. When you store text, Mnemora automatically embeds it using AWS Bedrock Titan (1024 dimensions) and deduplicates against existing memories.

from mnemora import MnemoraSync

with MnemoraSync(api_key="mnm_your_key_here") as client:
    # Store facts your agent has learned
    client.store_memory("research-agent", "The user prefers concise summaries under 200 words.")
    client.store_memory("research-agent", "Primary data source is the SEC EDGAR API.")
    client.store_memory(
        "research-agent",
        "Quarterly earnings reports should focus on revenue growth and margins.",
        metadata={"category": "report-style", "confidence": 0.95},
    )

Every call to store_memory embeds the text and stores it in Aurora pgvector. If a near-duplicate already exists (cosine similarity > 0.95), Mnemora merges the metadata instead of creating a duplicate entry.

Step 3: Search Memory

Before your agent acts, it should retrieve relevant knowledge. Vector search finds semantically similar memories even when the wording is different.

with MnemoraSync(api_key="mnm_your_key_here") as client:
    results = client.search_memory(
        "What format does the user want for reports?",
        agent_id="research-agent",
        top_k=3,
    )

    for result in results:
        print(f"[{result.similarity:.2f}] {result.content}")
        # [0.89] Quarterly earnings reports should focus on revenue growth and margins.
        # [0.82] The user prefers concise summaries under 200 words.

The threshold parameter (default 0.1) controls the minimum similarity score. Set it higher for stricter matches.

Step 4: Store Working State

Working memory is your agent's scratchpad — the current task, intermediate results, and session-specific data. It's backed by DynamoDB for sub-10ms reads.

with MnemoraSync(api_key="mnm_your_key_here") as client:
    # Save current task state
    client.store_state("research-agent", {
        "current_task": "Analyze Q3 earnings for AAPL",
        "step": 3,
        "total_steps": 7,
        "intermediate_results": {
            "revenue": "89.5B",
            "yoy_growth": "2.1%",
        },
    })

    # Later — retrieve it (even after a restart)
    state = client.get_state("research-agent")
    print(state.data["current_task"])
    # "Analyze Q3 earnings for AAPL"
    print(state.data["step"])
    # 3

Working state supports optimistic locking via a version field, so concurrent agents won't silently overwrite each other's progress.

Step 5: Log Episodes

Episodic memory records what happened — conversations, tool calls, decisions, and outcomes. Think of it as your agent's activity log, stored in time-series order.

with MnemoraSync(api_key="mnm_your_key_here") as client:
    # Log a conversation turn
    client.store_episode(
        agent_id="research-agent",
        session_id="sess-2025-03-03",
        type="conversation",
        content={"role": "user", "message": "Analyze Apple's Q3 earnings"},
    )

    # Log a tool call
    client.store_episode(
        agent_id="research-agent",
        session_id="sess-2025-03-03",
        type="tool_call",
        content={"tool": "sec_edgar_api", "query": "AAPL 10-Q 2024-Q3"},
        metadata={"latency_ms": 342, "status": "success"},
    )

    # Replay the session later
    episodes = client.get_episodes("research-agent", session_id="sess-2025-03-03")
    for ep in episodes:
        print(f"[{ep.type}] {ep.content}")

Hot episodes live in DynamoDB for fast access. Older episodes are automatically tiered to S3 for cost-effective long-term storage.

Full Working Example

Here's a complete agent loop that uses all four memory types together:

from mnemora import MnemoraSync

AGENT_ID = "research-agent"
SESSION_ID = "sess-2025-03-03"

with MnemoraSync(api_key="mnm_your_key_here") as client:
    # 1. Restore working state (or start fresh)
    try:
        state = client.get_state(AGENT_ID)
        print(f"Resuming from step {state.data['step']}")
    except Exception:
        client.store_state(AGENT_ID, {"step": 1, "task": "earnings-analysis"})
        print("Starting new task")

    # 2. Retrieve relevant knowledge
    memories = client.search_memory("earnings report format preferences", agent_id=AGENT_ID)
    context = "\n".join(m.content for m in memories)

    # 3. Do work (your agent logic here)
    result = f"Analysis complete. Context used: {len(memories)} memories."

    # 4. Store what the agent learned
    client.store_memory(AGENT_ID, "AAPL Q3 2024 revenue was $89.5B, up 2.1% YoY.")

    # 5. Log what happened
    client.store_episode(
        agent_id=AGENT_ID,
        session_id=SESSION_ID,
        type="action",
        content={"action": "earnings_analysis", "result": result},
    )

    # 6. Update working state
    client.store_state(AGENT_ID, {"step": 2, "task": "earnings-analysis", "status": "complete"})

What's Next

You now have an agent with four types of persistent memory, backed by DynamoDB, Aurora pgvector, and S3 — all through a single API key. No infrastructure to manage, no vector database to configure, no embedding pipeline to build.

To go further:

Get your API key at mnemora.dev/dashboard and start building agents that remember.