Core Concepts
Four memory types
Mnemora provides one API for four distinct memory types. Each type maps to a different storage backend and query pattern.
| Memory type | Backend | Analogy | Use when |
|---|---|---|---|
| Working | DynamoDB | RAM — current task state | You need fast key-value reads/writes per session |
| Semantic | Aurora + pgvector | Long-term memory — "what I know" | You need to find relevant facts by similarity |
| Episodic | DynamoDB + S3 | Journal — "what happened" | You need a time-ordered log of events |
| Procedural | Aurora (relational) | Skills — "how to do things" | You need versioned tool definitions and schemas |
Working memory
Working memory stores arbitrary JSON state per agent and session. It is backed by DynamoDB and returns in under 10ms.
Characteristics:
- Key-value pairs, up to the DynamoDB item size limit
- Optional TTL (hours) for automatic expiry
- Optimistic locking via a
versioninteger — prevents lost updates when two processes write simultaneously - Scoped to
agent_id+session_id
When to use it:
- Tracking the current plan or task for an ongoing agent run
- Storing tool call results that need to persist across Lambda invocations
- Holding short-lived context that expires after a session ends
When NOT to use it:
- Storing data you need to search by content — use semantic memory instead
- Storing more than a few KB of unstructured text — use episodic memory
Semantic memory
Semantic memory stores text content alongside a 1024-dimensional vector embedding. Search returns the most similar records by cosine similarity.
Characteristics:
- Content is automatically embedded via Bedrock Titan Text Embeddings v2 on write
- Deduplication: content with cosine similarity > 0.95 to an existing record is merged, not re-inserted
- Supports namespaces to logically partition memories within an agent
- Soft-delete sets
valid_until— records are excluded from search but not immediately removed - Search is scoped to a single agent or tenant-wide
When to use it:
- Storing facts, preferences, or domain knowledge the agent should recall
- Retrieving relevant context before constructing an LLM prompt
- Building a knowledge base that grows over time
Embedding details:
- Model:
amazon.titan-embed-text-v2:0 - Dimensions: 1024
- Similarity metric: cosine similarity (range 0–1, higher is more similar)
- Default search threshold: 0.7 (configurable per query)
Episodic memory
Episodic memory is an append-only, time-ordered log of events. Recent episodes are stored in DynamoDB (hot tier) and automatically tiered to S3 (cold tier) as they age.
Characteristics:
- Immutable append — episodes are never updated after creation
- Queryable by time range (
from_ts,to_ts), event type, and session - Full session replay via a single API call
- Hot-to-cold tiering is transparent — you query the same endpoint regardless of tier
Episode types:
| Type | When to use |
|---|---|
conversation | Chat messages between user and agent |
action | Tool calls, API requests, function executions |
observation | Results returned from tools or environment |
tool_call | Detailed tool invocation records |
When to use it:
- Auditing what an agent did during a session
- Replaying a session to resume or debug
- Feeding a summarization pipeline —
POST /v1/memory/episodic/{agent_id}/summarizeconverts episodes into semantic memories
Procedural memory
Procedural memory stores versioned tool definitions, JSON schemas, prompts, and rules. It is backed by Aurora PostgreSQL.
Types: tool, schema, prompt, rule
Procedural memory is managed via the Aurora schema directly in this version. SDK support is planned for a future release.
Session model
Every piece of working and episodic memory is scoped to an agent_id and a session_id.
agent_id— identifies the agent (a logical actor in your system). Reuse the sameagent_idacross sessions to accumulate history.session_id— identifies a discrete conversation or task run. Use a newsession_idfor each invocation if you want isolated episodes.
Semantic memory is scoped by agent_id and optionally by namespace. Sessions do not apply to semantic memory.
Example:
agent_id = "research-agent"
session_id = "sess-2026-01-15" ← one research task
session_id = "sess-2026-01-22" ← next research task
Semantic namespace = "facts" ← accumulated knowledge across all sessions
Semantic namespace = "preferences" ← user preferences
Multi-tenancy model
Every API key maps to exactly one tenant. Mnemora derives your tenant_id from your API key at the Lambda authorizer layer — you never pass a tenant_id in requests.
Isolation guarantees:
| Layer | Mechanism |
|---|---|
| DynamoDB | Partition key includes tenant_id prefix |
| Aurora | tenant_id column + Row-Level Security |
| S3 | Object prefix s3://mnemora-data/<tenant_id>/ |
No request can read or write another tenant's data. All isolation is logical, not physical — the infrastructure is shared.
Vector search
When you call store_memory, the API sends your content to Bedrock Titan and receives a 1024-dimensional float array. That vector is stored in Aurora alongside the raw text.
When you call search_memory, the API:
- Embeds your query with the same Bedrock Titan model
- Runs an HNSW approximate nearest-neighbor search against all stored vectors for the target agent
- Filters out results below the similarity threshold
- Returns up to
top_kresults, sorted by similarity (highest first)
Tuning search:
| Parameter | Default | Effect |
|---|---|---|
top_k | 10 | Maximum results returned |
threshold | 0.7 | Minimum cosine similarity for inclusion |
namespace | (all) | Restrict search to a specific namespace |
metadata_filter | (none) | Exact-match filter on metadata fields |
Lower threshold returns more results at lower relevance. Higher threshold returns fewer, more precise results.