Multi-Tenant Agent Memory: Building AI Features for SaaS

The Multi-Tenant Challenge

You're building a SaaS product and you want to add AI agent features. Maybe a support agent that answers customer questions, or a data analyst that generates reports, or a workflow assistant that automates tasks.

The problem: when you have 500 customers using AI agents, every customer's data must be completely isolated. Customer A's agent can never see customer B's conversations, knowledge, or state. A single data leak is a security incident, a compliance violation, and a trust-destroying event.

Multi-tenancy in traditional databases is well understood. But agent memory introduces new challenges: vector embeddings in shared indexes, episodic logs spanning multiple storage tiers, and real-time state that needs sub-10ms access. You need isolation at every layer, without sacrificing performance.

API Key Scoping

The foundation of Mnemora's multi-tenant isolation is the API key. Every API key maps to exactly one tenant_id. This mapping is stored in DynamoDB with the key SHA-256 hashed (Mnemora never stores API keys in plaintext).

When a request arrives at the API Gateway, the Lambda authorizer:

Hashes the bearer token from the Authorization header
Looks up the hash in the mnemora-users-dev DynamoDB table
Extracts the tenant_id, tier, and rate limits from the item
Injects the tenant_id into the Lambda authorizer context
Downstream handlers read the tenant ID from context — never from the request body

This means the client cannot supply or override their tenant ID. Even if a malicious client sends "tenant_id": "someone-else" in the request body, the handler ignores it and uses the authorizer-derived value.

# Inside every Lambda handler
tenant_id = event["requestContext"]["authorizer"]["tenant_id"]
# NOT from the request body — ever

Isolation at the Database Level

DynamoDB: Partition Key Prefix

Every item in DynamoDB uses a composite partition key: tenant_id#agent_id. This isn't just a convention — it's a physical isolation boundary.

DynamoDB partitions data by the partition key. A query for PK = "github:12345#support-agent" physically cannot return items from PK = "github:67890#support-agent". The database engine doesn't even scan the other tenant's data.

# Tenant A's data
PK: github:12345#support-agent   SK: SESSION#default
PK: github:12345#support-agent   SK: EPISODE#2025-02-10T10:30:00Z#ep-001

# Tenant B's data — completely separate partitions
PK: github:67890#support-agent   SK: SESSION#default
PK: github:67890#support-agent   SK: EPISODE#2025-02-10T10:30:00Z#ep-002

There is no SCAN operation in Mnemora's codebase. Every DynamoDB access is a GetItem or Query with the full partition key specified, which means cross-tenant data access is structurally impossible.

Aurora: Parameterized Queries + Row-Level Security

Semantic memory lives in Aurora PostgreSQL with pgvector. Every query includes the tenant_id as a parameterized condition:

SELECT id, content, embedding <=> $1::vector AS distance
FROM semantic_memory
WHERE tenant_id = $2 AND agent_id = $3
ORDER BY embedding <=> $1::vector
LIMIT $4;

The $2 parameter is always the authorizer-derived tenant ID. SQL injection attacks against the content or metadata fields cannot escape the tenant filter because the query is parameterized — the tenant ID is never interpolated into the SQL string.

For defense-in-depth, Aurora row-level security (RLS) policies enforce isolation at the database level:

ALTER TABLE semantic_memory ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON semantic_memory
    USING (tenant_id = current_setting('app.tenant_id'));

Even if a handler bug bypasses the WHERE clause, the RLS policy prevents cross-tenant reads.

S3: Prefix Isolation

Episodic memory tiers cold data to S3 with a prefix structure:

s3://mnemora-episodes-dev-993952121255/
  github:12345/                    # Tenant A
    support-agent/
      2025-02-10/ep-001.json
  github:67890/                    # Tenant B
    support-agent/
      2025-02-10/ep-002.json

Lambda functions construct S3 paths using the authorizer-derived tenant ID. The function's IAM role restricts access to the bucket, and the key prefix ensures tenants can't read each other's objects.

Example: Support Agent Per Customer

Here's how a SaaS platform would create an isolated support agent for each customer:

from mnemora import MnemoraSync

def handle_customer_message(customer_api_key: str, message: str):
    """Each customer uses their own API key, which scopes to their tenant."""

    with MnemoraSync(api_key=customer_api_key) as client:
        # Search this customer's knowledge base only
        relevant_docs = client.search_memory(
            message,
            agent_id="support-agent",
            top_k=5,
        )

        # Build context from customer-specific memories
        context = "\n".join(doc.content for doc in relevant_docs)

        # Your LLM call here, using the customer-specific context
        response = call_llm(message=message, context=context)

        # Log the interaction to this customer's episodic memory
        client.store_episode(
            agent_id="support-agent",
            session_id=f"ticket-{generate_ticket_id()}",
            type="conversation",
            content={"role": "user", "message": message},
        )

        # Store any new knowledge the agent learned
        if should_store_knowledge(response):
            client.store_memory(
                "support-agent",
                extract_knowledge(response),
                metadata={"source": "conversation"},
            )

        return response

Each customer's API key routes all operations to their tenant's isolated data partition. Customer A's support agent knowledge base, conversation history, and state are invisible to customer B — guaranteed at the database level.

Billing Per Tenant

Mnemora tracks usage per API key. Every API call increments a counter in the mnemora-users-dev DynamoDB table:

api_calls_today: resets daily, enforced against tier limits
vectors_stored: total semantic memory count
storage_bytes: total data across all memory types

This per-key tracking means you can bill each customer for their actual agent memory usage. The tier system (Free: 500 calls/day, Starter: 5K, Pro: 25K, Scale: 50K) enforces limits per-key at the authorizer level, before the request reaches any handler.

Why Shared-Nothing Isolation Matters

The shared-nothing model — where each tenant's data is logically separated at every layer — provides several guarantees:

Security: A vulnerability in one tenant's agent logic cannot expose another tenant's data. The isolation is enforced at the database level, not the application level.

Compliance: SOC 2, HIPAA, and GDPR audits require demonstrable data isolation. Partition key isolation in DynamoDB and RLS in Aurora provide auditable, enforceable boundaries.

Data portability: Need to export a tenant's data? Query everything with their partition key prefix. Need to delete it? A single purge_agent call removes all data across all memory types — DynamoDB items, Aurora rows, and S3 objects.

# GDPR right-to-deletion: one API call
with MnemoraSync(api_key=customer_api_key) as client:
    result = client.purge_agent("support-agent")
    print(result)
    # PurgeResponse(state_deleted=15, semantic_deleted=234, episodes_deleted=1891)

Performance isolation: DynamoDB's partition-based architecture means one tenant's heavy workload doesn't affect another's read latency. Each tenant's data lives in its own partition space with independent throughput.

Getting Started

If you're adding AI agent features to a SaaS product, multi-tenant memory isolation isn't optional — it's a requirement. Mnemora provides this isolation by default, at every layer, without requiring you to build custom partitioning logic.

Generate API keys per customer at mnemora.dev/dashboard
Use each customer's key in their agent's SDK instance
All data is automatically isolated by tenant

Read the architecture deep dive for more on how the isolation layers work, or jump into the 5-minute tutorial to start building.