Research: Knowledge System for AI Agents

Purpose: Deep research into agent memory architectures, academic literature, open source tooling, and design options for Kaze's shared knowledge system — the "Wikipedia for Agents." Status: Research Document Last Updated: 2026-03-06

The Problem We're Solving
Academic Literature Survey
Cross-Cutting Patterns from Research
Open Source Tooling Survey
Knowledge Graph Ontology Design
Architecture Options for Kaze
Recommendation
References

1. The Problem We're Solving

1.1 Why Agent Memory Matters

LLMs are stateless by default. Every conversation starts from zero. For agents that operate continuously across days, weeks, and months — processing invoices, managing SEO campaigns, handling CRM workflows — this is a fundamental limitation.

The problem has multiple dimensions:

Within a single agent: How does an agent remember what it did yesterday, what it learned from corrections, and what context it needs for the current task?
Across agents in the same vertical: How does one SEO agent's discovery ("this keyword strategy works for e-commerce") benefit all other SEO agents?
Across verticals: How do patterns learned in one domain ("clients prefer bullet-point reports") transfer to others?
Across time: How does knowledge stay current, get refined, and avoid rot?

1.2 The "Wikipedia for Agents" Vision

We envision a shared knowledge system where:

Agents can read knowledge contributed by other agents
Agents can write new knowledge based on their experiences and discoveries
Knowledge is structured (not just a bag of text) with relationships, categories, and hierarchies
Knowledge is versioned — every change is tracked with provenance (who changed what, when, why)
Knowledge has access control — some knowledge is private to an agent or client, some is shared across the vertical, some is platform-wide
Knowledge improves over time through consolidation, refinement, and feedback loops
Knowledge is retrievable through multiple strategies — semantic search, graph traversal, and direct lookup

This is analogous to how Wikipedia works for humans: a collaborative, structured, versioned, ever-improving knowledge base that anyone can read and contribute to, with editorial quality controls.

1.3 Memory Types We Need

Drawing from cognitive science (formalized in the CoALA framework), the knowledge system must support distinct types of memory:

Memory Type	Cognitive Analog	What It Stores	Example in Kaze
Semantic Memory	"What is true"	Facts, concepts, relationships, domain knowledge	"Title tags should be under 60 characters for SEO"
Episodic Memory	"What happened"	Timestamped records of specific events and experiences	"Client A's site lost 30% traffic on Jan 5 after a Google update"
Procedural Memory	"How to do it"	Action sequences, workflows, tool usage patterns, code	"To run a site audit: call Ahrefs API → parse results → generate report"
Working Memory	"What I'm thinking now"	Current task context, in-progress reasoning	The agent's active context window during a task
Reflective Memory	"What I've learned"	Synthesized insights derived from experience	"International invoices need extra tax validation — learned from 15% error rate in Q1"

Each type has different storage, retrieval, and lifecycle characteristics. A one-size-fits-all approach won't work.

2. Academic Literature Survey

2.1 MemGPT / Letta — Virtual Context Management

Paper: MemGPT: Towards LLMs as Operating Systems (Packer et al., Oct 2023) Link: https://arxiv.org/abs/2310.08560Evolved into: Letta (open source framework)

Core Idea

MemGPT draws a direct analogy from operating system virtual memory. It treats the LLM's context window as "RAM" and external storage as "disk," creating an illusion of unlimited memory within fixed context limits.

Memory Architecture

Three tiers:

┌──────────────────────────────┐
│  CORE MEMORY (in-context)    │  ← Always visible to the LLM
│  Fixed-size writeable block  │  ← Agent persona + key user facts
│  Analogous to registers/L1   │  ← Modified via explicit function calls
├──────────────────────────────┤
│  RECALL MEMORY (external)    │  ← Complete conversation history
│  Searchable via function     │  ← Summarized chunks from evicted context
│  calls                       │  ← Nothing is ever lost
├──────────────────────────────┤
│  ARCHIVAL MEMORY (external)  │  ← General-purpose long-term storage
│  Read-write datastore        │  ← Can use vector DB, graph DB, etc.
│  The agent's "filing cabinet"│  ← Processed, indexed information
└──────────────────────────────┘

How It Works

When the context window fills up, a queue manager evicts the oldest messages
Evicted messages are recursively summarized and stored in recall memory
The LLM itself decides what to page in/out via function calls (archival_memory_search(query), archival_memory_insert(content))
The agent actively manages its own memory rather than relying on a passive retrieval pipeline

Short-term vs Long-term

Short-term = core memory (what fits in the context window)
Long-term = recall + archival memory stored externally
The boundary is managed by a page-in/page-out mechanism, similar to OS virtual memory

Retrieval

Agent-initiated via tool calls. The LLM explicitly decides when to search memory and what to search for. This gives the agent agency over its own memory management, but also means the agent can "forget" to look things up.

2026 Evolution — Letta Context Repositories

As of February 2026, Letta introduced Context Repositories — a major rethinking of agent memory:

Agent context is stored as files in a git-backed filesystem
Agents can spawn subagents to reorganize their own memory
Multi-agent concurrent memory writing with git-based conflict resolution
Knowledge is versioned with full history, branching, and merging

This is architecturally very close to a wiki. It demonstrates that git-style version control on knowledge is a viable coordination primitive for multi-agent systems.

Strengths

Elegant OS analogy makes the architecture intuitive and composable
Agent has agency over its own memory (self-directed read/write)
No information is permanently lost (conversation history fully preserved)
Extensible — archival storage can be backed by any datastore
Context Repositories bring versioning and multi-agent coordination

Weaknesses

Relies on the LLM correctly deciding when to page in/out (failure mode: forgetting to retrieve relevant information)
Multiple LLM calls for memory management add latency and token cost
Core memory has a fixed size — no dynamic expansion
No built-in importance scoring or automatic consolidation in the base system

Relevance to Kaze

High. The tiered memory model is a good fit for per-agent memory. The Context Repositories pattern (git-based versioning) is directly applicable to our shared knowledge system — it shows that treating knowledge like code (versioned, branchable, mergeable) works in practice.

2.2 Generative Agents — Stanford (Park et al.)

Paper: Generative Agents: Interactive Simulacra of Human Behavior (Park et al., April 2023) Link: https://arxiv.org/abs/2304.03442Published at: UIST 2023

Core Idea

25 AI agents living in a simulated town, remembering experiences, reflecting on them, and forming long-term behaviors. The paper introduced the most influential memory retrieval formula in the field.

Memory Architecture

Centered on a Memory Stream — a comprehensive, append-only log of all agent experiences recorded in natural language.

┌─────────────────────────────────────────┐
│            MEMORY STREAM                 │
│  (append-only log of all experiences)   │
│                                          │
│  Entry 1: "Klaus saw Emily painting"     │
│  Entry 2: "Klaus ate breakfast at cafe"  │
│  Entry 3: "Klaus talked to Sam about..." │
│  ...                                     │
│  Entry N: (latest observation)           │
└───────────┬──────────────────────────────┘
            │
    ┌───────┴───────┐
    │               │
┌───┴───┐    ┌──────┴──────┐    ┌────────────┐
│RETRIEVE│    │  REFLECT    │    │   PLAN     │
│        │    │             │    │            │
│Score & │    │Synthesize   │    │High-level  │
│rank    │    │memories into│    │plans →     │
│memories│    │higher-level │    │detailed    │
│        │    │abstractions │    │actions     │
└────────┘    └─────────────┘    └────────────┘

Three processes operate on the memory stream:

Retrieval — A scoring function surfaces relevant memories on demand
Reflection — Periodically synthesizes memories into higher-level abstractions (e.g., many small observations about Klaus → "Klaus is interested in painting")
Planning — Generates high-level plans, recursively decomposed into detailed action sequences

The Tri-Factor Retrieval Formula

This is the paper's most lasting contribution — the de facto standard for agent memory retrieval:

score = α_recency × recency + α_importance × importance + α_relevance × relevance

Where:

Recency — Exponential decay function based on time since last access. Recent memories score higher. Decay factor of 0.995 per game hour.
Importance — LLM-assigned integer score (1-10) distinguishing mundane events from significant ones. "Ate breakfast" = 1, "Had a breakup" = 9. Scored once at creation time.
Relevance — Cosine similarity between embedding vectors of the memory and the current query/context.

All three scores are min-max normalized to [0,1]. In the implementation, all α weights are set to 1 (equal weighting). Top-ranked memories that fit in the context window are included.

The Reflection Mechanism

Reflections are higher-order memories synthesized from lower-level observations:

Triggered when the sum of importance scores of recent memories exceeds a threshold
The agent generates "questions that can be answered given the most recent observations"
For each question, retrieve relevant memories using the tri-factor formula
Synthesize retrieved memories into abstract statements (reflections)
Reflections are stored back in the memory stream as first-class entries (with their own importance scores)
Reflections can be retrieved and used to generate even higher-order reflections

This creates a hierarchical abstraction ladder: raw observations → first-order reflections → second-order reflections → beliefs.

Strengths

The tri-factor retrieval formula is simple, effective, and widely adopted across the field
Reflection mechanism enables emergent higher-order reasoning about experiences
Natural language storage is human-readable and debuggable
Proved that believable long-term agent behavior is achievable with relatively simple mechanisms
The memory stream is append-only, which is great for audit trails

Weaknesses

Memory stream grows without bound — no garbage collection, compression, or consolidation
Importance scoring relies on LLM judgment (noisy, and requires an LLM call per memory at creation)
Equal weighting of α values is a design choice, not learned from data
Reflection is triggered by a fixed threshold, not adaptively based on need
Single-agent design — no mechanism for shared memory across agents
Embedding-based relevance can miss important memories that are conceptually related but lexically different

Relevance to Kaze

The tri-factor retrieval formula is our baseline retrieval mechanism. We should implement recency + importance + relevance scoring for memory retrieval, potentially with learned or configurable weights rather than fixed equal weighting.

The reflection mechanism maps directly to our self-improvement loop — agents synthesizing experiences into reusable knowledge. The key extension we need is making reflections shared (contributing to the vertical knowledge graph) rather than agent-private.

2.3 Voyager — Skill Library as Procedural Memory

Paper: Voyager: An Open-Ended Embodied Agent with Large Language Models (Wang et al., May 2023) Link: https://arxiv.org/abs/2305.16291From: NVIDIA / MineDojo

Core Idea

An AI agent that plays Minecraft open-endedly, building an ever-growing library of reusable skills (as executable JavaScript code). Demonstrates lifelong learning: the agent continuously acquires new capabilities by composing previously learned skills.

Memory Architecture

Voyager's memory is fundamentally procedural — it stores how to do things as executable code:

┌─────────────────────────────────────────┐
│            SKILL LIBRARY                 │
│                                          │
│  Key: embedding(skill_description)       │
│  Value: executable JavaScript program    │
│                                          │
│  "mine_diamond_ore": {                   │
│    description: "Mine diamond ore at...", │
│    code: "async function mineDiamond(){  │
│      await bot.equip('iron_pickaxe');    │
│      await bot.dig(nearestDiamond);      │
│    }"                                    │
│  }                                       │
│                                          │
│  Skills grow monotonically               │
│  Complex skills compose simpler ones     │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│        AUTOMATIC CURRICULUM              │
│                                          │
│  GPT-4 generates exploration objectives  │
│  based on current state + inventory +    │
│  exploration progress                    │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│     ITERATIVE PROMPTING MECHANISM        │
│                                          │
│  Environment feedback + execution errors │
│  + self-verification → refine program    │
│  before adding to skill library          │
└─────────────────────────────────────────┘

Key Mechanism: Self-Verification Before Storage

Before a skill enters the library, it goes through verification:

The agent writes the code
The code is executed in the environment
Success/failure is evaluated
If it fails, the agent iterates (up to 3 times) incorporating error messages
Only verified, working skills are added to the library

This is the equivalent of code review before merging to main — directly analogous to Wikipedia's editorial process.

Retrieval

Pure embedding similarity — the current task description is embedded, and the most similar skill descriptions are retrieved from the library. No recency or importance weighting (skills don't decay — "how to mine diamond" is always relevant when you need to mine diamonds).

Composability

Complex skills build on simple ones:

"mine_diamond" uses "equip_pickaxe" and "find_ore"
"build_house" uses "gather_wood", "craft_planks", "place_block"
This creates a dependency graph of skills

Results

3.3x more unique items discovered than baselines
Continuous capability growth — skills never forgotten
Skills transfer to new environments (the library is portable)

Strengths

Procedural memory as code is composable, interpretable, and verifiable
Self-verification ensures quality before storage (quality gate)
Avoids catastrophic forgetting — skills persist as code, not neural weights
Demonstrates lifelong learning with compounding capability
Skills have clear input/output contracts

Weaknesses

Domain-specific (Minecraft) — the skill format doesn't directly generalize to all domains
No episodic or semantic memory — the agent doesn't "remember" events, only procedures
Retrieval is embedding-only — no recency, importance, or contextual filtering
No mechanism for skill deprecation, versioning, or conflict resolution
Single-agent — no shared skill library with access control

Relevance to Kaze

This maps directly to our agent skills model. In our architecture, skills are composable units with inputs, outputs, tool requirements, and quality criteria. Voyager validates this pattern.

Key design takeaways:

Skills should be verified before entering the shared library (self-verification / canary)
Skills should be composable (complex skills reference simpler ones)
Skill retrieval should be by semantic similarity to the current task
The skill library should grow monotonically (skills are versioned, not deleted)

2.4 Reflexion — Verbal Reinforcement Learning

Paper: Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al., March 2023) Link: https://arxiv.org/abs/2303.11366Published at: NeurIPS 2023

Core Idea

Instead of learning from scalar rewards (like RL), agents learn from verbal self-reflection. After each attempt at a task, the agent generates a natural-language reflection on what went wrong and how to improve. These reflections are stored and used to guide future attempts.

Memory Architecture

┌─────────────────────────────────────────┐
│              ACTOR                        │
│  LLM that generates actions              │
│  Conditioned on observations + memory    │
└────────────────┬─────────────────────────┘
                 │ trajectory
                 ▼
┌─────────────────────────────────────────┐
│            EVALUATOR                     │
│  Scores the trajectory                   │
│  Binary or scalar reward signal          │
└────────────────┬─────────────────────────┘
                 │ reward + trajectory
                 ▼
┌─────────────────────────────────────────┐
│         SELF-REFLECTION                  │
│  LLM generates verbal feedback           │
│  "I failed because I didn't check..."    │
│  Stored in episodic memory buffer        │
└────────────────┬─────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────┐
│       EPISODIC MEMORY BUFFER             │
│  Sliding window of past reflections      │
│  (typically last 3 reflections)          │
│  All loaded into context on next trial   │
└─────────────────────────────────────────┘

Short-term vs Long-term

Short-term = current trial's trajectory
Long-term = episodic memory buffer of past reflections
The buffer uses a sliding window (typically 3 reflections), so older reflections fall off

Retrieval

Simple — all stored reflections within the window are loaded into context. No semantic search or scoring. This works because the buffer is deliberately kept small and curated (only reflections, not raw experiences).

Strengths

Extremely lightweight — no vector databases, no complex retrieval infrastructure
Verbal reinforcement is more informative than scalar rewards ("I failed because X" vs "reward = 0")
Learns from failure without any weight updates to the model
Reflections are human-readable, interpretable, and debuggable
Practical improvement: 91% pass@1 on HumanEval (vs 80% baseline)

Weaknesses

Fixed-size sliding window severely limits long-term memory capacity
No structured retrieval — as reflection count grows beyond the window, older learnings are lost
Reflections can be repetitive or contradictory without curation
No mechanism for generalizing across tasks — reflections are task-specific
No shared reflections across agents

Relevance to Kaze

The reflection pattern is lightweight and valuable for individual agent self-improvement. In Kaze, each agent could maintain a small reflection buffer (like Reflexion) for its current task context, while the best reflections get promoted to the shared knowledge graph as permanent insights.

The key insight: reflections are the raw material for knowledge graph updates. When an agent reflects "I failed because international invoices need special tax handling," that reflection should be evaluated and potentially promoted to a shared skill improvement.

2.5 CoALA — Cognitive Architectures for Language Agents

Paper: Cognitive Architectures for Language Agents (Sumers, Yao et al., Sept 2023) Link: https://arxiv.org/abs/2309.02427Published at: TMLR 2024

Core Idea

CoALA is not a system — it's a unifying framework that organizes all agent architectures into a coherent cognitive model. It provides the vocabulary and taxonomy that the entire field uses.

The Framework

┌─────────────────────────────────────────────────────────┐
│                     AGENT                                │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │              WORKING MEMORY                       │   │
│  │  Short-term scratchpad for current reasoning      │   │
│  │  = LLM context window + in-context state          │   │
│  └──────────────────────────────────────────────────┘   │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │            LONG-TERM MEMORY                       │   │
│  │                                                    │   │
│  │  ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │   │
│  │  │  EPISODIC    │ │  SEMANTIC    │ │PROCEDURAL │ │   │
│  │  │              │ │              │ │           │ │   │
│  │  │  Past events │ │  Facts &     │ │ How-to    │ │   │
│  │  │  & experiences│ │  knowledge  │ │ knowledge │ │   │
│  │  └──────────────┘ └──────────────┘ └───────────┘ │   │
│  └──────────────────────────────────────────────────┘   │
│                                                          │
│  ┌──────────────────────────────────────────────────┐   │
│  │              ACTION SPACE                         │   │
│  │                                                    │   │
│  │  Internal actions:     External actions:           │   │
│  │  - Reasoning           - Tool use                  │   │
│  │  - Memory retrieval    - API calls                 │   │
│  │  - Memory writing      - Environment interaction   │   │
│  │  - Learning                                        │   │
│  └──────────────────────────────────────────────────┘   │
│                                                          │
│  Decision loop:                                          │
│  Observe → Retrieve → Reason → Decide → Act → Store     │
└─────────────────────────────────────────────────────────┘

Memory Type Definitions

Episodic Memory:

Records of specific past events and experiences
Typically stored as natural language with timestamps
Retrieved by recency, relevance, or explicit query
Examples: conversation logs, task outcomes, error reports
Paper mapping: Generative Agents' memory stream, Reflexion's reflection buffer

Semantic Memory:

Factual and declarative knowledge about the world
Can be stored as text, embeddings, knowledge graph triples, or structured data
Retrieved by semantic similarity, graph traversal, or direct lookup
Examples: domain facts, entity relationships, rules
Paper mapping: AriGraph's knowledge graph, RAG knowledge bases

Procedural Memory:

Knowledge of how to perform tasks — action sequences, policies, code
Can be stored as code (Voyager), prompts, tool-use patterns, or workflow definitions
Retrieved by task similarity
Examples: Voyager's skill library, prompt templates, tool chains
Paper mapping: Voyager's skills, ReAct-style action patterns

Key Insights from CoALA

Most current systems are incomplete. Many agents have episodic memory but lack procedural memory. Few have all three types.
Memory writing (learning) is as important as memory reading (retrieval). Most systems focus on retrieval; few have principled mechanisms for what to store and when.
The decision between internal and external actions is fundamental. When should an agent think more vs. act? When should it retrieve from memory vs. use a tool?
Consolidation is the missing piece. Most systems accumulate memory without principled compression or refinement.

Relevance to Kaze

CoALA is our design checklist. Our knowledge system must have distinct stores for all three memory types (episodic, semantic, procedural), with clear mechanisms for both reading and writing. The framework helps us avoid building a system that's strong on retrieval but weak on learning.

2.6 AriGraph — Knowledge Graph + Episodic Memory

Paper: AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents (Anokhin et al., July 2024) Link: https://arxiv.org/abs/2407.04363Published at: IJCAI 2025

Core Idea

Combines a structured semantic knowledge graph (entities + relationships) with episodic memory vertices (raw observations), linked together so you can always trace a fact back to its source.

Memory Architecture

Observation: "The key is on the table in the kitchen"
                    │
        ┌───────────┴──────────────┐
        │                          │
        ▼                          ▼
┌───────────────┐    ┌──────────────────────────────┐
│EPISODIC VERTEX│    │    SEMANTIC GRAPH UPDATE       │
│               │    │                                │
│ Full text of  │    │  (key)──[on]──▶(table)         │
│ observation   │    │  (table)──[in]──▶(kitchen)     │
│ + timestamp   │    │                                │
└───────┬───────┘    └──────────────┬─────────────────┘
        │                          │
        └──────── EPISODIC ────────┘
                   EDGES
          (link observation to
           extracted triplets
           for provenance)

At each timestep:

A new episodic vertex is appended (containing the full textual observation)
The LLM parses the observation to extract relationship triplets (entity1, relation, entity2)
Triplets update the semantic memory graph (nodes = entities, edges = relationships)
Episodic edges link each episodic vertex to the triplets it produced, preserving provenance

Retrieval

Two strategies combined:

Graph traversal — follow edges to reason about multi-hop relationships ("what room is the key in?" → key→on→table→in→kitchen)
Embedding similarity — for fuzzy matching when exact graph paths don't exist
Provenance lookup — trace any semantic fact back to the raw observation that produced it

Strengths

Structured representation enables multi-hop reasoning that flat text retrieval cannot do
Dual episodic/semantic storage preserves both raw data and derived knowledge
Provenance tracking — episodic edges linking to triplets support full auditability ("why does the system believe X? Because Agent Y observed Z on date W")
Outperforms both flat memory and pure RL baselines on complex reasoning tasks

Weaknesses

Triplet extraction quality depends heavily on LLM parsing accuracy — noisy extraction leads to noisy graphs
Graph can grow very large with no built-in pruning or consolidation
Primarily tested on text adventure games — real-world knowledge has more complex relationships than triplets capture
No multi-agent or shared graph mechanisms

Relevance to Kaze

High relevance. The dual episodic/semantic model with provenance is exactly what we need:

Semantic graph = our vertical knowledge ("SEO best practices", "keyword→topic relationships")
Episodic vertices = agent experiences ("Agent A processed Client B's audit on date C")
Provenance edges = the link between them ("we know this SEO practice works because Agent A's audit showed these results")

This enables a knowledge system where every fact can be traced back to its source — critical for trust, debugging, and quality control.

2.7 A-MEM — Zettelkasten-Inspired Agentic Memory

Paper: A-MEM: Agentic Memory for LLM Agents (Xu et al., Feb 2025) Link: https://arxiv.org/abs/2502.12110

Core Idea

Inspired by the Zettelkasten (slip-box) method of note-taking — a system of interconnected, atomic notes with bidirectional links. Each memory is enriched into a structured note with relationships to other notes, creating a knowledge network.

Memory Architecture

┌──────────────────────────────────────────────────┐
│                  NOTE NETWORK                      │
│                                                    │
│  ┌──────────┐      ┌──────────┐                   │
│  │ Note A   │──────│ Note B   │                   │
│  │ Keywords │      │ Keywords │                   │
│  │ Tags     │◀─────│ Tags     │                   │
│  │ Context  │      │ Context  │                   │
│  └────┬─────┘      └──────────┘                   │
│       │                                            │
│       │ causal link                                │
│       ▼                                            │
│  ┌──────────┐      ┌──────────┐                   │
│  │ Note C   │──────│ Note D   │                   │
│  │          │      │          │                   │
│  └──────────┘      └──────────┘                   │
│                                                    │
│  Notes linked by: causal, conceptual,              │
│  semantic, temporal relationships                  │
└──────────────────────────────────────────────────┘

How It Works

Note Construction: When a new memory arrives, the LLM enriches it into a structured note:
- Core content (the memory itself)
- Keywords (extracted key terms)
- Tags (categorical labels)
- Contextual description (expanded context)
- Embedding vector (for similarity search)
Link Generation: The LLM analyzes relationships between the new note and existing notes, identifying:
- Causal links ("A caused B")
- Conceptual links ("A is related to B conceptually")
- Semantic links ("A and B discuss the same topic")
- Temporal links ("A happened before B")
Memory Evolution: Notes can be updated, re-linked, and reorganized over time as understanding deepens.
Retrieval (Spreading Activation): When querying:
- The query is embedded and matched against note vectors
- When a note is retrieved, its linked notes are also automatically surfaced
- This "spreading activation" discovers non-obvious connections that pure embedding similarity would miss

Strengths

Rich interconnected structure enables discovery of non-obvious relationships
LLM-driven linking captures nuanced relationships that embedding similarity alone misses
Doubles performance on complex multi-hop reasoning tasks compared to flat retrieval
Cost-effective despite the multiple LLM calls during note processing
The Zettelkasten model is essentially a personal wiki — proven for human knowledge management over decades

Weaknesses

Multiple LLM calls per memory operation (note construction + link generation) adds latency
Link quality depends entirely on LLM reasoning quality — garbage in, garbage out
No built-in access control or multi-agent sharing mechanisms
No forgetting or consolidation — the network grows indefinitely
Can become computationally expensive for spreading activation as the network grows

Relevance to Kaze

Very high relevance. The Zettelkasten model is the closest existing pattern to a "Wikipedia for agents":

Each knowledge article is a note with structured metadata (keywords, tags, context)
Articles are interlinked with typed relationships (causal, conceptual, etc.)
Retrieval uses spreading activation — finding related knowledge, not just matching keywords
The structure supports human browsing (via links and tags) as well as AI retrieval (via embeddings)

The key extension we need: multi-author provenance, access control, and quality gates for shared notes.

2.8 Collaborative Memory — Multi-Agent Shared Memory with Access Control

Paper: Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control (Rezazadeh et al., May 2025) Link: https://arxiv.org/abs/2505.18279

Core Idea

The first paper to explicitly address shared memory across multiple agents/users with access control. Introduces a two-tier memory system with dynamic permission management.

Memory Architecture

┌──────────────────────────────────────────────────┐
│           COLLABORATIVE MEMORY SYSTEM             │
│                                                    │
│  ┌────────────────────────────────────────────┐   │
│  │         PRIVATE MEMORY TIER                 │   │
│  │                                              │   │
│  │  Agent A's private fragments                 │   │
│  │  Agent B's private fragments                 │   │
│  │  (visible only to originator)                │   │
│  └────────────────────────────────────────────┘   │
│                                                    │
│  ┌────────────────────────────────────────────┐   │
│  │         SHARED MEMORY TIER                  │   │
│  │                                              │   │
│  │  Shared fragments with access policies       │   │
│  │  (selectively visible based on permissions)  │   │
│  └────────────────────────────────────────────┘   │
│                                                    │
│  ┌────────────────────────────────────────────┐   │
│  │    DYNAMIC BIPARTITE ACCESS GRAPH           │   │
│  │                                              │   │
│  │  Users ◄──────────► Agents                   │   │
│  │    │                    │                     │   │
│  │    └──── Resources ─────┘                     │   │
│  │                                              │   │
│  │  Time-evolving graph linking users,           │   │
│  │  agents, and memory resources                 │   │
│  └────────────────────────────────────────────┘   │
│                                                    │
│  PROVENANCE: Each fragment carries immutable       │
│  metadata — contributing agents, accessed           │
│  resources, timestamps                              │
└──────────────────────────────────────────────────┘

Key Mechanisms

Two-tier storage: Every memory fragment is either private (agent-local) or shared (published to a collaborative pool). Agents control what they share.
Dynamic Bipartite Access Graphs: Time-evolving graphs that model who can access what. The graph links:
- Users → Agents (which agents work for which users)
- Agents → Resources (which memory fragments an agent can access)
- Users → Resources (what a user has contributed or can view)
Attribute-Based Access Control (ABAC): Permissions are based on attributes (role, project, vertical, client) rather than static role lists. Policies are configurable at system, user, or agent level.
Immutable Provenance: Every memory fragment carries:
- Which agent(s) contributed it
- What resources were accessed to create it
- Timestamps of creation and modification
- Source attribution chain

Strengths

First real solution for multi-agent shared memory with proper access control
Dynamic permissions adapt as agents and users change roles
Provenance tracking enables full auditability
ABAC model is flexible enough for complex organizational structures
Private/shared distinction maps naturally to agent-local vs. vertical knowledge

Weaknesses

Paper is theoretical — limited real-world validation at scale
Access graph can become complex to manage in large deployments
No built-in quality gate for shared memory (any agent can publish)
No consolidation or contradiction resolution between conflicting shared memories

Relevance to Kaze

Directly applicable. This paper addresses our exact problem — multiple agents sharing knowledge with:

Private tier = agent working memory and client-specific knowledge
Shared tier = vertical knowledge, cross-vertical patterns
Access control = client isolation (Client A's data never visible to Client B's agents), vertical scoping, role-based permissions
Provenance = every knowledge update traced to its source agent, client context, and evidence

2.9 MIRIX — Six-Component Multi-Agent Memory

Paper: MIRIX: Multi-Agent Memory System for LLM-Based Agents (Wang & Chen, July 2025) Link: https://arxiv.org/abs/2507.07957

Core Idea

The most comprehensive modular memory system in the literature, decomposing agent memory into six distinct components, each managed by a dedicated Memory Manager agent.

Memory Architecture

┌──────────────────────────────────────────────────────┐
│                    MIRIX SYSTEM                        │
│                                                        │
│  ┌──────────────────────────────────────────────────┐ │
│  │            META MEMORY MANAGER                    │ │
│  │    Routes queries to appropriate component        │ │
│  └──────────┬───────────────────────────────────────┘ │
│             │                                          │
│  ┌──────────┴───────────────────────────────────────┐ │
│  │                                                    │ │
│  │  ┌──────────────┐  ┌──────────────┐               │ │
│  │  │ CORE MEMORY  │  │  EPISODIC    │               │ │
│  │  │              │  │  MEMORY      │               │ │
│  │  │ Persistent   │  │              │               │ │
│  │  │ personalized │  │ Time-stamped │               │ │
│  │  │ data (user & │  │ user-specific│               │ │
│  │  │ agent        │  │ events       │               │ │
│  │  │ profiles)    │  │              │               │ │
│  │  └──────────────┘  └──────────────┘               │ │
│  │                                                    │ │
│  │  ┌──────────────┐  ┌──────────────┐               │ │
│  │  │ SEMANTIC     │  │ PROCEDURAL   │               │ │
│  │  │ MEMORY       │  │ MEMORY       │               │ │
│  │  │              │  │              │               │ │
│  │  │ General      │  │ Actionable   │               │ │
│  │  │ knowledge &  │  │ workflows &  │               │ │
│  │  │ social graphs│  │ scripts      │               │ │
│  │  └──────────────┘  └──────────────┘               │ │
│  │                                                    │ │
│  │  ┌──────────────┐  ┌──────────────┐               │ │
│  │  │ RESOURCE     │  │ KNOWLEDGE    │               │ │
│  │  │ MEMORY       │  │ VAULT        │               │ │
│  │  │              │  │              │               │ │
│  │  │ Documents,   │  │ Critical     │               │ │
│  │  │ files, and   │  │ verbatim     │               │ │
│  │  │ media        │  │ information  │               │ │
│  │  └──────────────┘  └──────────────┘               │ │
│  │                                                    │ │
│  └──────────────────────────────────────────────────┘ │
│                                                        │
│  Each component has its own Memory Manager agent       │
│  Meta Memory Manager coordinates routing               │
└──────────────────────────────────────────────────────┘

The Six Components

Component	What It Stores	Retrieval	Update Frequency
Core Memory	Agent/user profiles, persistent personalized data	Direct lookup by entity	Infrequent (profile changes)
Episodic Memory	Time-stamped user-specific events	Recency + relevance scoring	Per-interaction
Semantic Memory	General knowledge, social graphs, domain facts	Graph traversal + similarity	As new knowledge is discovered
Procedural Memory	Workflows, scripts, action templates	Task similarity matching	When procedures are learned/updated
Resource Memory	Documents, files, media referenced by agents	Metadata + content search	When new resources are ingested
Knowledge Vault	Critical verbatim information (addresses, credentials, exact quotes)	Exact match + keyword lookup	Rare (critical data changes slowly)

Meta Memory Manager

The routing layer that decides which component(s) to query for a given request. Uses intent classification to route:

"What's the user's email?" → Knowledge Vault
"What happened last Tuesday?" → Episodic Memory
"How do I run a site audit?" → Procedural Memory
"What does this document say about X?" → Resource Memory

Performance

35% improvement over RAG baselines
99.9% storage reduction compared to storing raw conversation history
85.38% accuracy on LOCOMO benchmark (state of the art at time of publication)

Strengths

Most granular memory decomposition — each type is optimized for its specific access patterns
Memory Manager agents provide intelligent routing (no need for the calling agent to know which component to query)
Modular — components can be independently scaled, optimized, or replaced
Storage efficiency from proper categorization (not everything needs vector embedding)

Weaknesses

Six components + six managers + one meta-manager = significant system complexity
Manager agents themselves consume LLM tokens for routing decisions
No multi-tenant or shared memory across agents/users
Coordination between managers can introduce latency
May be over-engineered for simpler use cases

Relevance to Kaze

MIRIX provides a practical template for decomposing our knowledge system. Not all six components may be needed immediately, but the taxonomy is valuable:

Core Memory → our agent and client profiles
Episodic Memory → our agent execution logs and experience records
Semantic Memory → our vertical knowledge graph
Procedural Memory → our agent skills library
Resource Memory → documents, SOPs, manuals that agents reference
Knowledge Vault → client credentials, critical business data (handled by Vault in our stack)

The Meta Memory Manager pattern is interesting — a routing agent that knows which knowledge store to query. This could be a component of our orchestration layer.

2.10 Memory in the Age of AI Agents — Survey Paper

Paper: Memory in the Age of AI Agents (Liu et al., Dec 2025) Link: https://arxiv.org/abs/2512.13564

Core Contribution

The most comprehensive survey of agent memory, proposing a three-axis taxonomy:

Axis 1: Forms (how memory is stored)

Token-level: In-context memory (the conversation itself)
Parametric: Fine-tuned into model weights (LoRA, etc.)
Latent: Stored as embedding vectors or hidden states

Axis 2: Functions (what memory is for)

Factual: Knowledge about the world
Experiential: Records of past events and outcomes
Working: Active task context

Axis 3: Dynamics (how memory changes)

Formation: How new memories are created
Evolution: How memories are updated, consolidated, or forgotten
Retrieval: How memories are accessed when needed

Key Findings from the Survey

Most systems focus on retrieval, neglecting formation and evolution. Building good retrieval is necessary but not sufficient — you also need principled mechanisms for deciding what to remember and how to update existing knowledge.
Consolidation is the biggest gap. Most systems accumulate memories indefinitely. The episodic-to-semantic consolidation pathway (turning raw experiences into structured knowledge) is the key unsolved problem for scalable shared knowledge.
Evaluation benchmarks are immature. There's no standard benchmark for long-term agent memory quality, making it hard to compare approaches objectively.
Multi-agent memory is nascent. Shared memory, knowledge transfer across agents, and collaborative knowledge building are identified as frontier research problems.

3. Cross-Cutting Patterns from Research

3.1 Memory Taxonomy Convergence

The field has converged on 3-6 memory types rooted in cognitive science:

Memory Type	Papers That Use It	Storage Approach	Retrieval Approach
Episodic	Generative Agents, AriGraph, CoALA, MIRIX, Collaborative Memory	Timestamped natural language entries	Recency + relevance + importance
Semantic	AriGraph, CoALA, MIRIX, A-MEM	Knowledge graph triples, structured notes	Graph traversal + embedding similarity
Procedural	Voyager, CoALA, MIRIX	Executable code, workflow definitions	Task similarity matching
Working	MemGPT (core memory), CoALA	Context window contents	Always in-context
Reflective	Generative Agents, Reflexion	Natural language insights	Loaded directly or by relevance

3.2 Retrieval Strategy Spectrum

From simple to sophisticated:

Level	Strategy	Used By	Complexity	Quality
1	Full context loading	Reflexion	Trivial	Only works with tiny memory
2	Embedding similarity only	Voyager	Low	Misses temporal/importance signals
3	Tri-factor scoring (recency + importance + relevance)	Generative Agents	Medium	De facto standard, good baseline
4	Agent-initiated retrieval (LLM decides when/what to search)	MemGPT/Letta	Medium	More flexible, but depends on LLM judgment
5	Graph traversal + spreading activation	AriGraph, A-MEM	High	Best for multi-hop reasoning, discovers non-obvious connections
6	Learned/adaptive retrieval (dynamic weight adjustment)	Emerging research	High	Frontier — not yet proven at scale

For Kaze: We should implement Level 3 (tri-factor) as baseline, with Level 5 (graph traversal) for the structured knowledge graph, and Level 4 (agent-initiated) for agent autonomy over their own memory.

3.3 Memory Consolidation Patterns

Three pathways identified across the literature:

Episodic-to-Semantic Consolidation Raw experiences are distilled into general knowledge.
- Generative Agents: many observations → reflections → beliefs
- AriGraph: observations → triplet extraction → knowledge graph
- For Kaze: Agent experiences → quality-checked insights → vertical knowledge graph updates
Summarization / Compression Long content compressed to preserve signal while reducing tokens.
- MemGPT: recursive summarization of evicted context
- For Kaze: Conversation histories and execution logs → compressed summaries → retained in episodic memory
Experience Distillation Interaction trajectories converted to reusable procedures.
- Voyager: gameplay → verified skill code
- Reflexion: trial trajectories → verbal reflections
- For Kaze: Agent task execution → verified skill improvements → skill library updates

Four emerging approaches:

Pattern	Description	Strongest Example	Maturity
Shared Blackboard	Agents read/write to a common database	Basic multi-agent systems	Mature but simplistic
Provenance-Tracked Fragments	Each memory carries metadata about origin	Collaborative Memory (2025)	Emerging
Git-Based Versioned Knowledge	Knowledge versioned like code with branching and merging	Letta Context Repositories (2026)	Emerging
Orchestrator Merges	A meta-agent merges specialist contributions into global store	Various multi-agent coding systems	Moderate

For Kaze: Combine provenance-tracked fragments (Collaborative Memory) with git-like versioning (Letta) for the shared knowledge layer.

3.5 Synthesis: Design Principles for Kaze's Knowledge System

Drawing from the entire literature:

Type your knowledge. Distinct stores for episodic, semantic, procedural, and reflective memory. Each has different storage, retrieval, and lifecycle characteristics.
Layer your retrieval. Tri-factor scoring (recency + importance + relevance) as baseline, graph traversal for structured knowledge, agent-initiated search for autonomy.
Version everything. Every write is a versioned commit with agent identity, timestamp, and source attribution. Treat knowledge like code — branchable, mergeable, diffable.
Track provenance. Every fact traces back to its source observation, agent, and evidence chain. Essential for trust, debugging, and quality control.
Gate quality. New knowledge goes through verification before entering the shared store (Voyager's self-verification, Wikipedia's editorial process). The governance layer reviews proposed knowledge changes.
Consolidate actively. Don't just accumulate — periodically distill experiences into knowledge, compress stale memories, detect and resolve contradictions.
Control access. Private (agent-local) vs. shared (vertical) vs. global (cross-vertical). Dynamic, attribute-based access control per the Collaborative Memory model.

4. Open Source Tooling Survey

4.1 Dedicated Agent Memory Frameworks

Mem0

What: Universal memory layer for AI agents. Extracts, consolidates, and retrieves information from conversations. Enhanced variant (Mem0g) uses graph-based memory for relational reasoning.
Source: https://github.com/mem0ai/mem0
License: Apache 2.0 (open source) + managed cloud offering
Self-hostable: Yes — supports Kubernetes, air-gapped servers, private clouds
Maturity: High — 186M API calls/quarter (Q3 2025), $24M raised (Oct 2025), AWS partnership (exclusive memory provider for AWS Agent SDK)
Key features:
- Single API: mem0.add(), mem0.search(), mem0.get_all()
- Automatic memory extraction from conversations
- Graph memory variant for temporal and relational reasoning
- SOC 2 & HIPAA compliant, BYOK (Bring Your Own Key)
- Every memory is timestamped, versioned, and exportable
- 26% accuracy improvement, 91% lower p95 latency, 90% token savings vs. raw context
Limitations:
- Primarily designed for per-user/per-agent memory, not a shared knowledge graph
- Graph memory (Mem0g) is newer and less battle-tested
- Managed cloud has the most features; open source version may lag
Fit for Kaze: Strong candidate for per-agent memory layer. Handles the working memory and episodic memory needs well. Not a replacement for the shared vertical knowledge system — it's the memory inside each agent, not the wiki.

Letta (formerly MemGPT)

What: Platform for building stateful agents with OS-inspired tiered memory (core/recall/archival). Built by the MemGPT research team.
Source: https://github.com/letta-ai/letta
License: Apache 2.0
Self-hostable: Yes — Docker/Kubernetes
Maturity: High — active open source community, 100+ contributors, #1 on Terminal-Bench for model-agnostic coding agents
Key features:
- Three-tier memory: core (always in context), recall (conversation history), archival (long-term)
- Agent-initiated memory management via tool calls
- Context Repositories (Feb 2026): git-backed knowledge versioning
- Conversations API: shared memory across parallel agent interactions
- Multi-model support (works with any LLM)
Limitations:
- More of a full agent framework than a standalone memory component
- Adopting Letta means adopting its agent runtime model (may conflict with building our own)
- Context Repositories are very new (Feb 2026)
Fit for Kaze: Strong architectural inspiration, especially Context Repositories. The git-based versioning pattern should inform our knowledge system design. As a direct dependency, it may be too opinionated — we'd be building our agent runtime on top of theirs rather than our own.

Cognee

What: Knowledge engine for AI agents. Takes documents, generates knowledge graphs from them, queries via semantic + graph traversal. Auto-learns from feedback.
Source: https://github.com/topoteretes/cognee
License: Apache 2.0
Self-hostable: Yes — containerized, integrates with various graph DBs (Neo4j, Memgraph, FalkorDB) and vector stores
Maturity: Medium — growing community, GitHub Secure Open Source certified, active development
Key features:
- Automated knowledge graph construction from documents (5 lines of code)
- Combines embeddings with graph-based extraction (subject-relation-object triplets)
- Feedback loop: rated responses update edge weights, improving retrieval over time ("Memify" feature)
- Lexical chunk retrieval + temporal awareness
- Pluggable backends (multiple graph DBs and vector stores)
Limitations:
- Less mature than Mem0 or Letta
- Quality of extracted knowledge graphs depends on source document quality
- Smaller community and ecosystem
Fit for Kaze: Strong candidate as the knowledge graph construction pipeline. It automates the process of turning documents, SOPs, and manuals into structured knowledge graphs — exactly what we need when onboarding a new vertical. The feedback loop (edge weight updates from usage) aligns with our self-improvement goals.

LangChain/LangGraph Memory

What: Memory modules for LangChain-based agents — conversation buffer, summary memory, entity memory, vector-backed retrieval.
Source: https://github.com/langchain-ai/langchain
License: MIT
Self-hostable: Yes (library, not infrastructure)
Maturity: High ecosystem adoption, but memory modules are relatively basic
Key features:
- ConversationBufferMemory, ConversationSummaryMemory, EntityMemory
- Vector store-backed retrieval memory
- LangGraph adds stateful graph-based workflows with persistence
Limitations:
- Memory modules are designed for single-agent conversation memory, not shared knowledge
- Tightly coupled to LangChain abstractions
- No knowledge graph construction, no provenance tracking, no access control
- Basic compared to Mem0, Letta, or Cognee
Fit for Kaze: Weak. Too basic and too LangChain-specific. Individual features (vector retrieval, summary compression) are useful patterns but better implemented directly or via more capable tools.

Microsoft GraphRAG

What: A pipeline that auto-extracts knowledge graphs from text documents, builds community hierarchies, and generates summaries for RAG-based querying.
Source: https://github.com/microsoft/graphrag
License: MIT
Self-hostable: Yes — Python package, runs anywhere
Maturity: High — backed by Microsoft Research, active development, growing adoption
Key features:
- Automated entity and relationship extraction from text
- Community detection and hierarchical clustering
- Multi-level summarization (local and global)
- Designed for complex "global" queries across large document collections
Limitations:
- Knowledge graph extraction costs 3-5x more than baseline RAG (heavy LLM usage)
- Designed for document collections, not real-time agent memory
- No built-in agent integration — it's a pipeline, not a runtime
- No incremental updates — reprocessing the full corpus for changes is expensive
Fit for Kaze: Strong as a knowledge graph construction tool, especially for initial onboarding of verticals (converting existing SOPs, manuals, and documentation into structured knowledge). Less suitable for real-time agent memory. Could complement Cognee — use GraphRAG for initial bulk knowledge graph construction, Cognee for ongoing incremental updates.

4.2 Vector Databases

Pgvector (PostgreSQL Extension)

What: Adds vector data types, similarity search (cosine, L2, inner product), and indexing (IVFFlat, HNSW) to PostgreSQL.
Source: https://github.com/pgvector/pgvector
License: PostgreSQL License (permissive)
Self-hostable: Yes — it's PostgreSQL
K8s: Native via CloudNativePG operator (already in our stack)
Maturity: High — widely adopted, well-maintained, battle-tested
Performance: Good for up to ~100M vectors. pgvectorscale extension achieves 471 QPS at 99% recall on 50M vectors.
Key strength: Zero new infrastructure — adds vector capabilities to the Postgres we already have. SQL joins across relational + vector data.
Key weakness: Not as performant as dedicated vector databases at extreme scale. Limited to what Postgres can handle.
Fit for Kaze: Best starting point. We already committed to PostgreSQL. Adding pgvector means vector search with no new infrastructure. Sufficient for early-to-mid scale. Can be complemented with a dedicated vector DB later if needed.

Qdrant

What: Purpose-built high-performance vector database written in Rust. Designed for production AI workloads.
Source: https://github.com/qdrant/qdrant
License: Apache 2.0
Self-hostable: Yes — Docker, Kubernetes (Helm charts available)
K8s: Native support, horizontal scaling
Maturity: High — production-proven, growing enterprise adoption
Performance: Fastest in benchmarks for pure vector search. Excellent filtering performance with payload indexing.
Key strength: Best pure performance, advanced filtering and payload indexing, Rust-native stability
Key weakness: Another system to operate. Adds operational complexity over pgvector.
Fit for Kaze: Scale-up option. If pgvector becomes a performance bottleneck (likely at >100M vectors or high QPS requirements), Qdrant is the best upgrade path. Keep the architecture compatible from day one.

Weaviate

What: Vector database with built-in hybrid search (vector + keyword), graph-like object references, and multi-modal support (text, image, video).
Source: https://github.com/weaviate/weaviate
License: BSD-3-Clause
Self-hostable: Yes — Docker, Kubernetes
K8s: Native support
Maturity: High — well-funded, large community
Key strength: Hybrid search out of the box. Built-in cross-references between objects (graph-like). GraphQL API.
Key weakness: Slower than Qdrant in pure vector benchmarks. Additional graph features add complexity. gRPC + GraphQL learning curve.
Fit for Kaze: Interesting for its hybrid search and cross-reference features, but adds complexity we may not need if we're using a separate graph database. The cross-reference feature partially overlaps with our knowledge graph needs.

Chroma

What: Simplest vector database API. Designed for developer experience and rapid prototyping.
Source: https://github.com/chroma-core/chroma
License: Apache 2.0
Self-hostable: Yes — but Kubernetes story is less mature
Maturity: Medium — popular for prototyping, less proven at production scale
Fit for Kaze: Too lightweight for production multi-tenant deployments. Good for local development/testing.

Milvus

What: Cloud-native vector database designed for massive scale. Supports billion-scale vector datasets.
Source: https://github.com/milvus-io/milvus
License: Apache 2.0
Self-hostable: Yes — Kubernetes-native (distributed architecture with separate storage and compute)
Maturity: High — CNCF graduated project, large community
Key strength: Designed for billion-scale. Distributed architecture. Strong Kubernetes story.
Key weakness: Complex to operate — many moving parts (proxy, query node, data node, index node, etcd, MinIO, Pulsar). Overkill for early scale.
Fit for Kaze: Too heavy for MVP. Worth revisiting if vector search needs exceed Qdrant's single-node capacity.

Pinecone

What: Fully managed vector database. Zero ops.
License: Proprietary (managed only)
Self-hostable: No
Fit for Kaze: Fails cloud-agnostic requirement. Cannot self-host or deploy in customer VPC.

4.3 Graph Databases

Apache AGE (PostgreSQL Extension)

What: Adds graph database capabilities to PostgreSQL. Supports the openCypher query language for graph queries alongside standard SQL.
Source: https://github.com/apache/age
License: Apache 2.0
Self-hostable: Yes — it's PostgreSQL
K8s: Native via CloudNativePG (same as pgvector)
Maturity: Medium — Apache incubating project, growing community, but less mature than Neo4j
Key strength: Same rationale as pgvector — adds graph capabilities to our existing Postgres. No new infrastructure. Can query graph data alongside relational and vector data in the same database.
Key weakness: Less performant than native graph databases for complex traversals. Cypher support is partial (not full Neo4j Cypher). Smaller ecosystem and community than Neo4j. Less mature graph-specific optimizations.
Fit for Kaze: Best starting point for the knowledge graph. Combined with pgvector, gives us relational + vector + graph in one Postgres instance. Sufficient for early-to-mid scale knowledge graphs. Upgrade to a dedicated graph DB only if performance demands it.

FalkorDB

What: High-performance graph database built for GraphRAG and AI workloads. Successor to RedisGraph (after Redis Ltd EOL'd it in Jan 2025). Uses sparse matrix algebra for blazing-fast graph algorithms.
Source: https://github.com/FalkorDB/FalkorDB
License: Server Side Public License (SSPL) — note: this is not a traditional open source license
Self-hostable: Yes — Docker, Kubernetes
K8s: Supported
Maturity: Medium — growing rapidly, dedicated to AI graph workloads
Performance: Claims 500x faster p99 and 10x faster p50 latency than Neo4j in aggregate expansion operations
Key strength: Purpose-built for GraphRAG. Blazing fast graph traversals. Full Cypher support. Multi-graph architecture.
Key weakness: SSPL license may be problematic for some use cases. Smaller community than Neo4j. Less mature ecosystem. Redis-heritage means it's primarily in-memory (implications for very large graphs).
Fit for Kaze: Strong upgrade option from Apache AGE. If our knowledge graph grows to a scale where Postgres-based graph queries become a bottleneck, FalkorDB is the performance-oriented choice. The Cypher compatibility makes migration from Apache AGE manageable.

Neo4j

What: The most established and mature graph database. Property graph model with the Cypher query language.
Source: https://github.com/neo4j/neo4j (Community Edition)
License: Community Edition: GPLv3. Enterprise: Commercial license (expensive).
Self-hostable: Community Edition is free. Enterprise features (clustering, security, performance) require paid license.
K8s: Supported via Helm charts
Maturity: Very high — largest ecosystem, most documentation, most graph-trained developers
Key strength: Most mature, largest ecosystem, best tooling, most comprehensive Cypher implementation. Recent additions: native vector search capabilities.
Key weakness: Enterprise features locked behind expensive licensing. Community Edition lacks clustering and advanced security. GPLv3 for community may complicate our licensing.
Fit for Kaze: The obvious choice on maturity alone, but licensing is a concern. If we're deploying into customer VPCs, Neo4j Enterprise licensing costs get passed to every deployment. Community Edition's limitations (no clustering, limited security) may be blockers. Worth revisiting if/when we need enterprise-grade graph capabilities and are willing to pay for them.

ArangoDB

What: Multi-model database supporting documents, graphs, key-value, and search in a single engine.
Source: https://github.com/arangodb/arangodb
License: Apache 2.0
Self-hostable: Yes — Docker, Kubernetes
K8s: Native support via Kubernetes Operator
Maturity: High — been around since 2012, production-proven
Key strength: True multi-model in one engine. Graph + document + search without multiple databases. AQL query language is powerful.
Key weakness: Jack of all trades — doesn't outperform specialized databases in any single model. Smaller community than Neo4j or Postgres. AQL is its own language (not Cypher, not SQL).
Fit for Kaze: Interesting alternative to the "Postgres + extensions" approach. But introduces a non-standard query language and a less familiar database. The multi-model promise is similar to SurrealDB but more mature.

4.4 Hybrid / Multi-Model Options

Postgres + pgvector + Apache AGE (The Stack)

What: A single PostgreSQL instance with both extensions, providing relational + vector + graph capabilities.
How it works: Standard SQL for relational data, pgvector for similarity search, Apache AGE for Cypher graph queries. All data lives in one database, queryable with SQL.
Strengths:
- One database to operate, backup, monitor, and scale
- Already in our stack (we committed to Postgres)
- Battle-tested foundation (Postgres)
- CloudNativePG operator for Kubernetes
- Can join across relational, vector, and graph data
Weaknesses:
- Neither the best vector DB nor the best graph DB — adequate at both, optimal at neither
- Apache AGE is less mature than Neo4j's Cypher implementation
- pgvector performance ceiling is lower than Qdrant's
- Complex queries that span all three models may be slow
Fit for Kaze: The pragmatic default. Minimum operational burden. Good enough for MVP and early scale. Clear upgrade paths when specific capabilities need more performance.

SurrealDB

What: A Rust-native, multi-model database unifying documents, graphs, vectors, full-text search, time-series, and relational data in one engine. Explicitly marketing as "the multi-model database for AI agents."
Source: https://github.com/surrealdb/surrealdb
License: Business Source License 1.1 (transitions to Apache 2.0 after 4 years)
Self-hostable: Yes — single binary, Docker, Kubernetes
K8s: Supported
Maturity: Lower — version 3.0 launched recently with $23M funding. Growing rapidly but significantly less battle-tested than Postgres.
Key features:
- Single engine replaces multiple databases
- Built-in HNSW vector indexing
- Graph traversal + vector search + relational queries all transactional
- Built-in access control (row-level security, RBAC)
- Real-time change feeds
- SurrealQL (its own query language)
Strengths:
- Replaces Postgres + pgvector + Apache AGE + potentially more with one system
- Built-in access control is relevant for our multi-tenant needs
- Designed specifically for AI agent workloads
- Single system to operate
Weaknesses:
- Young — v3.0 just launched, significantly less battle-tested than Postgres
- BSL license (not truly open source until 4-year transition)
- SurrealQL is another query language to learn (not SQL, not Cypher)
- Smaller community and ecosystem
- Risk: if SurrealDB pivots, fails, or changes licensing, we're dependent
- Performance benchmarks at scale are limited
Fit for Kaze: The bold option. If it delivers on its promises, it's exactly what we need — one database for everything, with built-in access control. But the maturity risk is real. Worth watching and potentially evaluating for Phase 2 or 3, but risky as a day-1 foundation.

5. Knowledge Graph Ontology Design

This section covers how to structure the knowledge graph itself — node types, edge types, temporal modeling, and schema enforcement. While Section 3 covers what kinds of memory to store, this section covers how the semantic/graph layer is modeled.

5.1 Current State: Kaze Knowledge (As-Built)

The kaze-knowledge service today is a flat episodic memory store with no ontology:

Single entity type: Memory — id, text, 768-dim embedding (gemini-embedding-001), metadata JSON
Scoping: userId + optional agentId for multi-tenant isolation
Storage: pgvector (production), Qdrant or SQLite fallback
Pipeline: Messages → LLM fact extraction (Gemini 2.5 Flash via Mem0) → embed → store
No relationships, no entity types, no temporal tracking, no graph structure

This is Layer 0 — sufficient for semantic search over agent memories but unable to do multi-hop reasoning, relationship tracking, contradiction detection, or structured knowledge navigation.

5.2 Ontology Standards Landscape

Standard	Purpose	Complexity	Fit for Kaze
Schema.org	Lightweight web vocabulary for common entities	Low	Good starting vocabulary for Person, Organization, Event, Action
SKOS (Simple Knowledge Organization System)	Taxonomies, concept hierarchies (broader/narrower)	Low-Medium	Ideal for organizing business domain categories, vertical taxonomies
OWL (Web Ontology Language)	Formal ontologies with reasoning and inference	High	Overkill unless we need formal inference — we don't
RDF / RDF-star	Triple-based data model with edge metadata	Medium	RDF-star is interesting (metadata on edges) but property graphs are more practical
SHACL	Shape constraints / validation for RDF graphs	Medium	Useful concept (schema validation for graphs) but we'd implement this in application code

Decision: Property graph over RDF. The research strongly favors property graphs (Neo4j, Memgraph, Apache AGE) for AI-native systems:

Better embedding integration (vector properties on nodes)
Faster traversal, simpler query language (Cypher vs SPARQL)
More natural mapping from LLM extraction output (entities with properties + typed relationships)
RDF retains advantages only for standards compliance, cross-org federation, or formal inference — none of which are MVP concerns

We adopt SKOS-style concept hierarchies (broader/narrower) for organizing business knowledge taxonomically, without the full RDF stack.

5.3 How Open-Source Systems Model Their Ontologies

Graphiti (by Zep) — Most Sophisticated

Paper: arXiv 2501.13956 | GitHub: https://github.com/getzep/graphiti

Three-tier subgraph model:

G = (N, E) where:
  N = EntityNodes ∪ EpisodicNodes ∪ CommunityNodes
  E = EntityEdges ∪ EpisodicEdges ∪ CommunityEdges

Node Type	Properties	Purpose
`EntityNode`	name, name_embedding, summary, attributes dict, type label	Extracted knowledge entities
`EpisodicNode`	content, source_description, timestamp	Raw ingestion records (provenance)
`CommunityNode`	summary, member entities, level	Auto-detected topic clusters (Leiden algorithm)

Edge Type	Properties	Purpose
`EntityEdge`	fact text, embedding, `valid_at`/`invalid_at`, `created_at`/`expired_at`	Semantic relationships with bi-temporal validity
`EpisodicEdge`	—	Provenance links (entity ← extracted from → episode)
`CommunityEdge`	—	Community membership

Key innovation: Bi-temporal model. Every fact carries four timestamps:

valid_at / invalid_at — when the fact is/was true in the real world
created_at / expired_at — when the fact was recorded/retracted in the system

This enables point-in-time queries ("What did we know as of date X?") and historical analysis ("How has our understanding evolved?").

Retrieval: Hybrid — semantic embeddings + BM25 keyword search + graph traversal. Backend: Neo4j.

Custom entity/edge types: Supported via Pydantic models with user-defined labels and properties.

Mem0g (Graph Memory)

Paper: arXiv 2504.19413 | GitHub: https://github.com/mem0ai/mem0

Simpler model than Graphiti:

Nodes: Entities extracted from conversations (name, type, embedding)
Edges: Relationships between entities (type, description)
Two-phase pipeline: Extract (entity extractor + relation generator) → Update (conflict detector + update resolver)
Hierarchical scoping: user level → session level → agent level

Retrieval: Vector similarity + BM25 reranking on relationship triplets + graph traversal for multi-hop context.

Trade-off vs Graphiti: Faster construction (under 1 minute), simpler to operate, but no temporal modeling and less sophisticated community detection.

Microsoft GraphRAG

GitHub: https://github.com/microsoft/graphrag

Pipeline-oriented model:

Text chunked into text units
LLM extracts entities (title, type, description) and relationships (source, target, description) per text unit
Entities with same title+type are deduplicated, descriptions merged
Hierarchical community detection via Leiden algorithm (levels 0-N)
LLM generates community reports (5-10 key insights per community)
Embeddings on entity descriptions, text units, and community reports

Two query modes:

Global Search: Uses community summaries for thematic questions ("What are the main patterns?")
Local Search: Fans out from specific entities through neighbors for targeted queries

Key insight: Community detection + LLM summarization creates navigable hierarchies without manual curation — directly applicable to organizing vertical knowledge.

5.4 Academic Research on Ontology Design

Ontology-Guided KGs Outperform Pure Vector RAG

Paper: Ontology Learning and Knowledge Graph Construction (arXiv 2511.05991, 2024)

Compared six LLM systems for automated ontology construction. Key findings:

Ontology-guided KGs incorporating chunk information achieve competitive performance with SOTA frameworks
They substantially outperform vector retrieval baselines
Ontologies extracted from relational databases yield performance comparable to text-derived ones while being far more cost-efficient
Implication for Kaze: Investing in a proper ontology pays off in retrieval quality

HybridRAG: Vector + Graph Fusion

Paper: HybridRAG: Integrating Knowledge Graphs and Vector RAG (arXiv 2408.04948, NVIDIA/BlackRock, 2024)

Demonstrated that fusing KG + vector RAG outperforms either alone on faithfulness, answer relevance, and context recall.

Retrieval pattern:

Query → [Vector Search]  → candidate chunks
      → [Graph Search]   → related entities + relationships
      → [BM25 Search]    → keyword matches
      → [Merge & Rerank] → unified context
      → [LLM Generation] → grounded answer

Fine-Grained Access Control for Knowledge Graphs

Paper: A Fine-grained Access Control Model for Knowledge Graphs (Valzelli, 2020)

Uses Can/Cannot edges with role/group-based agent grouping. Integrates user graph with domain graph via access control edges.

Simpler alternative for Kaze: Property-based filtering (tenant_id, visibility, access_roles[]) on every node and edge. Enforced at query time. This maps directly to our multi-tenant SaaS model without the overhead of a separate access control graph.

Wikidata's Qualifier Pattern

Paper: Empirical Ontology Design Patterns from Wikidata (Carriero, Groth, Presutti, 2024)

Wikidata models facts as statements with qualifiers and references:

Statement: "Company X → headquartered_in → Vietnam"
  Qualifier: start_time = "2020-01-01"
  Qualifier: end_time = (open)
  Reference: source = "company registration document"
  Reference: retrieved = "2025-06-15"

This qualifier pattern — attaching contextual metadata to individual facts — is directly useful for AI knowledge where every fact needs provenance, confidence, and temporal context.

5.5 Ontology Patterns for Kaze

Drawing from the research, four key patterns to adopt:

Pattern 1: Layered Knowledge Model

Layer 1 — Source Layer (raw ingestion)
  ├── Episode: conversation transcript, document, API response
  ├── Chunk: text segment with embedding and position
  └── Metadata: source, timestamp, agent, ingestion method

Layer 2 — Knowledge Layer (extracted/curated)
  ├── Entity: name, type, description, embedding, aliases
  ├── Fact: subject → predicate → object, with confidence + provenance
  └── Relationship: typed, weighted, temporal

Layer 3 — Organization Layer (auto-generated)
  ├── Community: auto-detected entity clusters (Leiden), with LLM summaries
  ├── Taxonomy: SKOS-style broader/narrower concept hierarchies
  └── Index: search-optimized representations (embeddings, BM25)

Each layer has different write patterns, lifecycle, and access characteristics. Source layer is append-only. Knowledge layer is curated (quality-gated writes). Organization layer is periodically recomputed.

Pattern 2: Bi-Temporal Facts

Every fact and relationship tracks two time dimensions:

Dimension	Fields	Purpose
Valid time (real-world)	`valid_from`, `valid_until`	When the fact is/was true ("Company X moved HQ in 2024")
System time (ingestion)	`created_at`, `expired_at`	When the system recorded/retracted the fact

This enables:

Point-in-time queries: "What did we know about Client X as of last month?"
Historical analysis: "How has our understanding of this SEO strategy evolved?"
Conflict resolution: "Which version of this fact is current?"
Audit trail: "When did we learn this, and from what source?"

Pattern 3: Provenance Chain

Every node and edge carries:

provenance:
  source_episode_id  → links back to raw source (Layer 1)
  extraction_method  → "llm_extraction" | "manual" | "api_import" | "consolidation"
  extraction_agent   → which agent or pipeline created this
  confidence         → 0.0-1.0 extraction/inference confidence
  evidence_count     → how many independent sources support this fact
  last_verified      → timestamp of last verification

This is essential for trust, debugging, and quality control — especially when multiple agents are contributing knowledge.

Pattern 4: Multi-Tenancy via Properties

Every node and edge carries:

access:
  tenant_id    → owner tenant
  visibility   → "private" | "vertical" | "platform"
  created_by   → agent or user who created this

Queries always filter by tenant_id. Platform-level knowledge (e.g., "SEO best practices") is visible to all tenants in that vertical. Private knowledge (e.g., "Client A's brand guidelines") is tenant-isolated.

5.6 Proposed Ontology Schema for Kaze Knowledge

Node Types

Type	Properties	Description
Entity	`name`, `type` (Person/Org/Concept/Process/Tool/...), `description`, `embedding`, `aliases[]`	Extracted knowledge entities
Episode	`content`, `source_type` (conversation/document/api), `source_id`, `timestamp`, `agent_id`	Raw ingestion events — provenance anchors
Chunk	`text`, `embedding`, `position`, `episode_id`	Text segments for vector retrieval
Community	`summary`, `level`, `member_count`, `embedding`	Auto-detected topic clusters
Fact	`statement`, `confidence`, `evidence_count`, `embedding`	Reified high-value claims needing explicit provenance

Edge Types

Type	Source → Target	Properties	Description
RELATES_TO	Entity → Entity	`relationship_type`, `description`, `confidence`, `valid_from`, `valid_until`	Semantic relationships (typed, temporal, weighted)
EXTRACTED_FROM	Entity/Fact → Episode	`extraction_method`, `confidence`	Provenance tracking
MENTIONS	Chunk → Entity	`position`, `confidence`	Chunk-to-entity linking for retrieval
MEMBER_OF	Entity → Community	`relevance_score`	Community membership
BROADER / NARROWER	Entity → Entity	—	SKOS-style concept hierarchy
SUPERSEDES	Fact → Fact	`reason`	Fact evolution tracking

Common Properties (All Nodes and Edges)

tenant_id    : string    # multi-tenancy isolation
visibility   : enum      # private | vertical | platform
created_at   : timestamp # system time — when recorded
updated_at   : timestamp # system time — last modified
created_by   : string    # agent or user identity

Entity Type Taxonomy (Starter Set)

Schema enforcement is hybrid: these core types are predefined, but the LLM can discover domain-specific subtypes.

Core Type	Examples	Vertical Subtypes (discovered)
Person	Client contacts, team members	—
Organization	Client companies, partners, vendors	—
Concept	SEO strategies, accounting principles	`SEOTactic`, `TaxRule`, `MarketTrend`
Process	Workflows, procedures, playbooks	`AuditWorkflow`, `ContentPipeline`
Tool	Software, APIs, platforms	`SEOTool`, `AccountingSoftware`
Event	Incidents, milestones, changes	`GoogleUpdate`, `ClientMilestone`
Document	Reports, contracts, specifications	`AuditReport`, `BrandGuideline`
Metric	KPIs, measurements, scores	`TrafficMetric`, `RevenueMetric`

5.7 Upgrade Path from Current State

Phase	What Changes	Ontology Additions
Current (Mem0 flat memory)	No graph, just vector search	Memory → text + embedding + metadata
Phase 1 (Enable Mem0g)	Add graph mode to existing Mem0	Entity nodes + relationship edges (Mem0g's built-in model)
Phase 2 (Custom ontology)	Add typed entities, bi-temporal tracking, provenance	Full node/edge schema above, stored in Apache AGE
Phase 3 (Community detection)	Add auto-clustering, hierarchical summaries	Community nodes, Leiden algorithm (GraphRAG pattern)
Phase 4 (Hybrid retrieval)	Vector + graph + BM25 fusion	Unified retrieval engine across all three indexes

5.8 Key References for This Section

Source	Contribution
Graphiti / Zep	Bi-temporal model, three-tier subgraph, custom entity types
Mem0 paper	Hierarchical memory, graph memory variant
Microsoft GraphRAG	Community detection, hierarchical summarization
HybridRAG	Vector + graph fusion outperforms either alone
Ontology Learning for KGs	Ontology-guided KGs beat pure vector RAG
Wikidata ODPs	Qualifier pattern for contextual fact metadata
KG Access Control	Fine-grained access control models
SKOS Reference	Concept hierarchy standard
Property Graph vs RDF	Comparison for AI-native systems

6. Architecture Options for Kaze

6.1 Option A: Postgres-Centric (Conservative)

┌──────────────────────────────────────────────┐
│                PostgreSQL                      │
│                                                │
│  pgvector ─── semantic retrieval (embeddings) │
│  Apache AGE ─ knowledge graph (entities +     │
│               relationships as Cypher graphs)  │
│  Standard ─── episodic logs, agent state,     │
│  tables       audit trail, client data         │
└──────────────────────────────────────────────┘

+ Cognee or GraphRAG ── knowledge graph construction pipeline
+ Mem0 ────────────────── per-agent memory layer

Dimension	Assessment
Operational complexity	Low — one database
Performance ceiling	Medium — adequate for MVP, will need upgrades at scale
Maturity / risk	Very low — Postgres is battle-tested
Cloud-agnostic	Yes — Postgres runs everywhere
K8s native	Yes — CloudNativePG
Cost	Low — one system to operate
Upgrade path	Swap pgvector → Qdrant, swap AGE → FalkorDB when needed

Best for: MVP through early-to-mid scale. Minimize operational burden while proving the product.

6.2 Option B: Postgres + Dedicated Graph DB (Balanced)

┌──────────────────────────────────┐  ┌──────────────────┐
│           PostgreSQL              │  │    FalkorDB      │
│                                    │  │                  │
│  pgvector ── semantic retrieval   │  │  Knowledge graph │
│  Standard ── episodic logs,       │  │  (entities,      │
│  tables      agent state, audit   │  │   relationships, │
│                                    │  │   traversals)    │
└──────────────────────────────────┘  └──────────────────┘

+ Cognee or GraphRAG ── knowledge graph construction pipeline
+ Mem0 ────────────────── per-agent memory layer

Dimension	Assessment
Operational complexity	Medium — two databases to manage
Performance ceiling	High — dedicated graph DB handles complex traversals well
Maturity / risk	Low-medium — Postgres is solid, FalkorDB is newer but performant
Cloud-agnostic	Yes — both run anywhere
K8s native	Yes — both have K8s support
Cost	Medium — two systems to operate
Upgrade path	Swap pgvector → Qdrant if vector search needs scaling

Best for: When the knowledge graph becomes a core differentiator and needs high-performance graph queries (complex multi-hop reasoning across verticals).

6.3 Option C: SurrealDB (Bold)

┌──────────────────────────────────────────────┐
│                SurrealDB                       │
│                                                │
│  Vector index ── semantic retrieval            │
│  Graph model ─── knowledge graph               │
│  Document model ─ episodic logs, agent state   │
│  Built-in RBAC ─ access control                │
│  Change feeds ── real-time event streaming     │
└──────────────────────────────────────────────┘

+ Cognee or GraphRAG ── knowledge graph construction pipeline
+ Mem0 ────────────────── per-agent memory layer

Dimension	Assessment
Operational complexity	Low — one system
Performance ceiling	Unknown — insufficient benchmarks at scale
Maturity / risk	High risk — v3.0 just launched, BSL license
Cloud-agnostic	Yes — single binary
K8s native	Yes
Cost	Low — one system
Upgrade path	Harder — SurrealQL is proprietary, migration away is more work

Best for: Teams willing to bet on a newer technology for architectural elegance. Higher risk, potentially higher reward.

6.4 Option D: Build Custom Knowledge Layer (Full Control)

┌──────────────────────────────────────────────┐
│           PostgreSQL + pgvector + AGE          │
│           (storage foundation)                 │
└──────────────────────┬───────────────────────┘
                       │
┌──────────────────────┴───────────────────────┐
│      Custom Kaze Knowledge Layer (TypeScript)  │
│                                                │
│  Memory type routing (MIRIX-inspired)          │
│  Tri-factor retrieval (Generative Agents)      │
│  Graph traversal + spreading activation        │
│  Git-like versioning (Letta-inspired)          │
│  Access control (Collaborative Memory)         │
│  Provenance tracking (AriGraph-inspired)       │
│  Consolidation pipelines                       │
│  Quality gates for shared knowledge            │
└──────────────────────────────────────────────┘

Dimension	Assessment
Operational complexity	Low (infra) + High (development)
Performance ceiling	As high as you build it
Maturity / risk	Low infra risk, high development risk (building it takes time)
Differentiation	Highest — the knowledge system IS the product moat
Cost	High development investment
Time to MVP	Longest

Best for: When the knowledge system is a core competitive advantage worth investing heavily in. Long-term optimal but slow to start.

6.5 Hybrid Recommendation: Start A, Evolve to D

The pragmatic path combines the speed of Option A with the long-term vision of Option D:

Phase 1 (MVP): Option A — Postgres-centric + Mem0 + Cognee/GraphRAG

Get agents running with working memory fast
Use Mem0 for per-agent memory (don't build from scratch)
Use Cognee/GraphRAG to bootstrap the first vertical's knowledge graph
Store everything in Postgres (pgvector + Apache AGE)

Phase 2 (Product-Market Fit): Start building Option D's custom layer

Build the Kaze Knowledge Layer incrementally on top of Postgres
Implement typed memory (episodic/semantic/procedural) routing
Add provenance tracking and versioning
Add access control and tenant isolation
Gradually replace Mem0 with our own memory management as we understand the patterns better

Phase 3 (Scale): Optimize the storage layer

If graph queries need more performance → evaluate FalkorDB (Option B)
If vector search needs more throughput → add Qdrant
If we want to consolidate → evaluate SurrealDB (Option C) as a potential replacement

This approach de-risks the MVP while preserving optionality for the future.

7. Recommendation

7.1 Proposed Knowledge System Architecture

┌─────────────────────────────────────────────────────────┐
│              KAZE KNOWLEDGE SYSTEM                        │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │          KNOWLEDGE LAYER (TypeScript)                │ │
│  │                                                      │ │
│  │  ┌──────────────────────────────────────────────┐   │ │
│  │  │           Memory Type Router                  │   │ │
│  │  │     (inspired by MIRIX Meta Manager)          │   │ │
│  │  └──────┬──────┬──────────┬──────────┬──────────┘   │ │
│  │         │      │          │          │               │ │
│  │    ┌────┴──┐┌──┴────┐┌───┴────┐┌────┴─────┐        │ │
│  │    │Episod.││Semant.││Proced. ││Reflect.  │        │ │
│  │    │Memory ││Memory ││Memory  ││Memory    │        │ │
│  │    │       ││       ││        ││          │        │ │
│  │    │Events,││Facts, ││Skills, ││Insights, │        │ │
│  │    │logs,  ││graph, ││how-to, ││learnings │        │ │
│  │    │history││rels   ││code    ││          │        │ │
│  │    └───────┘└───────┘└────────┘└──────────┘        │ │
│  │                                                      │ │
│  │  ┌──────────────────────────────────────────────┐   │ │
│  │  │    Retrieval Engine                           │   │ │
│  │  │    - Tri-factor scoring (baseline)            │   │ │
│  │  │    - Graph traversal (knowledge graph)        │   │ │
│  │  │    - Spreading activation (linked notes)      │   │ │
│  │  │    - Agent-initiated search (tool calls)      │   │ │
│  │  └──────────────────────────────────────────────┘   │ │
│  │                                                      │ │
│  │  ┌──────────────────────────────────────────────┐   │ │
│  │  │    Write Pipeline                             │   │ │
│  │  │    - Provenance tagging (AriGraph-inspired)   │   │ │
│  │  │    - Version control (Letta-inspired)         │   │ │
│  │  │    - Quality gate (Voyager self-verification) │   │ │
│  │  │    - Access control (Collaborative Memory)    │   │ │
│  │  └──────────────────────────────────────────────┘   │ │
│  │                                                      │ │
│  │  ┌──────────────────────────────────────────────┐   │ │
│  │  │    Consolidation Engine                       │   │ │
│  │  │    - Episodic → Semantic distillation         │   │ │
│  │  │    - Reflection synthesis                     │   │ │
│  │  │    - Contradiction detection                  │   │ │
│  │  │    - Importance-based retention               │   │ │
│  │  └──────────────────────────────────────────────┘   │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │          STORAGE LAYER                               │ │
│  │                                                      │ │
│  │  Phase 1: PostgreSQL + pgvector + Apache AGE         │ │
│  │  Phase 2: + Qdrant (if vector scale needed)          │ │
│  │  Phase 3: + FalkorDB (if graph scale needed)         │ │
│  │           or SurrealDB (if consolidation desired)    │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │     CONSTRUCTION PIPELINE                            │ │
│  │                                                      │ │
│  │  Cognee ── incremental knowledge graph updates       │ │
│  │  GraphRAG ── bulk document → knowledge graph         │ │
│  │  Custom ── agent experience → knowledge distillation │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │     PER-AGENT MEMORY                                 │ │
│  │                                                      │ │
│  │  Mem0 ── working + episodic memory per agent         │ │
│  │  (Phase 2: evaluate building custom replacement)     │ │
│  └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

7.2 Key Design Decisions

Decision	Choice	Rationale
Memory types	Episodic + Semantic + Procedural + Reflective	Aligned with CoALA framework and MIRIX. Proven taxonomy.
Storage (Phase 1)	Postgres + pgvector + Apache AGE	One DB, already in stack, good enough for MVP, clear upgrade paths
Per-agent memory	Mem0	Production-proven, handles agent-level memory well, saves dev time
KG construction	Cognee + GraphRAG	Cognee for incremental, GraphRAG for bulk. Both open source.
Retrieval strategy	Tri-factor + graph traversal + agent-initiated	Layered approach covering baseline ranking, structured traversal, and agent autonomy
Versioning model	Git-inspired (Letta pattern)	Every knowledge write is a versioned commit with provenance
Access control	Private/Shared tiers with ABAC	Collaborative Memory pattern. Enables client isolation + vertical sharing.
Quality gates	Verification before shared knowledge entry	Voyager pattern. Prevents low-quality knowledge from polluting the shared store.

7.3 What We're Intentionally Deferring

Parametric memory (fine-tuning): Not for MVP. Too expensive and complex. Revisit when we have enough vertical data to warrant it.
Learned retrieval weights: Start with fixed tri-factor weights. Learn them from data later when we have enough usage signal.
Billion-scale vector search: pgvector handles early scale. Qdrant or Milvus when needed.
Real-time cross-cell knowledge sync: Phase 3 mesh feature. Start with single-cell knowledge.

8. References

Academic Papers

Paper	Authors	Year	Key Contribution	Link
Generative Agents	Park et al.	2023	Memory stream + tri-factor retrieval + reflection	arXiv
MemGPT	Packer et al.	2023	OS-inspired tiered memory, virtual context management	arXiv
Voyager	Wang et al.	2023	Skill library as procedural memory	arXiv
Reflexion	Shinn et al.	2023	Verbal reinforcement learning, reflective memory	arXiv
CoALA	Sumers, Yao et al.	2023	Unifying cognitive architecture taxonomy	arXiv
AriGraph	Anokhin et al.	2024	Knowledge graph + episodic memory with provenance	arXiv
A-MEM	Xu et al.	2025	Zettelkasten-inspired agentic memory	arXiv
Collaborative Memory	Rezazadeh et al.	2025	Multi-user shared memory with access control	arXiv
MIRIX	Wang & Chen	2025	Six-component modular memory system	arXiv
Memory in the Age of AI Agents	Liu et al.	2025	Comprehensive survey of agent memory	arXiv
Mem0	Chadha et al.	2025	Production-ready scalable agent memory	arXiv
HybridRAG	Besta et al. (NVIDIA/BlackRock)	2024	Vector + KG RAG fusion outperforms either alone	arXiv
Ontology Learning for KGs	Multiple authors	2024	Ontology-guided KGs beat pure vector RAG	arXiv
KG Access Control	Valzelli	2020	Fine-grained ACL model for knowledge graphs	SciTePress
Wikidata Ontology Design Patterns	Carriero, Groth, Presutti	2024	Empirical ODPs and qualifier patterns from Wikidata	SAGE
Zep / Graphiti	Rasmussen et al.	2025	Bi-temporal KG for agent memory, 18.5% accuracy gain	arXiv

Open Source Tools

Tool	Repository	License
Mem0	github.com/mem0ai/mem0	Apache 2.0
Letta	github.com/letta-ai/letta	Apache 2.0
Cognee	github.com/topoteretes/cognee	Apache 2.0
Graphiti (Zep)	github.com/getzep/graphiti	Apache 2.0
Microsoft GraphRAG	github.com/microsoft/graphrag	MIT
pgvector	github.com/pgvector/pgvector	PostgreSQL License
Qdrant	github.com/qdrant/qdrant	Apache 2.0
Weaviate	github.com/weaviate/weaviate	BSD-3-Clause
Apache AGE	github.com/apache/age	Apache 2.0
FalkorDB	github.com/FalkorDB/FalkorDB	SSPL
Neo4j	github.com/neo4j/neo4j	GPLv3 / Commercial
SurrealDB	github.com/surrealdb/surrealdb	BSL 1.1

Research: Knowledge System for AI Agents ​

Table of Contents ​

1. The Problem We're Solving ​

1.1 Why Agent Memory Matters ​

1.2 The "Wikipedia for Agents" Vision ​

1.3 Memory Types We Need ​

2. Academic Literature Survey ​

2.1 MemGPT / Letta — Virtual Context Management ​

Core Idea ​

Memory Architecture ​

How It Works ​

Short-term vs Long-term ​

Retrieval ​

2026 Evolution — Letta Context Repositories ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.2 Generative Agents — Stanford (Park et al.) ​

Core Idea ​

Memory Architecture ​

The Tri-Factor Retrieval Formula ​

The Reflection Mechanism ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.3 Voyager — Skill Library as Procedural Memory ​

Core Idea ​

Memory Architecture ​

Key Mechanism: Self-Verification Before Storage ​

Retrieval ​

Composability ​

Results ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.4 Reflexion — Verbal Reinforcement Learning ​

Core Idea ​

Memory Architecture ​

Short-term vs Long-term ​

Retrieval ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.5 CoALA — Cognitive Architectures for Language Agents ​

Core Idea ​

The Framework ​

Memory Type Definitions ​

Key Insights from CoALA ​

Relevance to Kaze ​

2.6 AriGraph — Knowledge Graph + Episodic Memory ​

Core Idea ​

Memory Architecture ​

Retrieval ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.7 A-MEM — Zettelkasten-Inspired Agentic Memory ​

Core Idea ​

Memory Architecture ​

How It Works ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.8 Collaborative Memory — Multi-Agent Shared Memory with Access Control ​

Core Idea ​

Memory Architecture ​

Key Mechanisms ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.9 MIRIX — Six-Component Multi-Agent Memory ​

Core Idea ​

Memory Architecture ​

The Six Components ​

Meta Memory Manager ​

Performance ​

Strengths ​

Weaknesses ​

Relevance to Kaze ​

2.10 Memory in the Age of AI Agents — Survey Paper ​

Research: Knowledge System for AI Agents

Table of Contents

1. The Problem We're Solving

1.1 Why Agent Memory Matters

1.2 The "Wikipedia for Agents" Vision

1.3 Memory Types We Need

2. Academic Literature Survey

2.1 MemGPT / Letta — Virtual Context Management

Core Idea

Memory Architecture

How It Works

Short-term vs Long-term

Retrieval

2026 Evolution — Letta Context Repositories

Strengths

Weaknesses

Relevance to Kaze

2.2 Generative Agents — Stanford (Park et al.)

Core Idea

Memory Architecture

The Tri-Factor Retrieval Formula

The Reflection Mechanism

Strengths

Weaknesses

Relevance to Kaze

2.3 Voyager — Skill Library as Procedural Memory

Core Idea

Memory Architecture

Key Mechanism: Self-Verification Before Storage

Retrieval

Composability

Results

Strengths

Weaknesses

Relevance to Kaze

2.4 Reflexion — Verbal Reinforcement Learning

Core Idea

Memory Architecture

Short-term vs Long-term

Retrieval

Strengths

Weaknesses

Relevance to Kaze

2.5 CoALA — Cognitive Architectures for Language Agents

Core Idea

The Framework

Memory Type Definitions

Key Insights from CoALA

Relevance to Kaze

2.6 AriGraph — Knowledge Graph + Episodic Memory

Core Idea

Memory Architecture

Retrieval

Strengths

Weaknesses

Relevance to Kaze

2.7 A-MEM — Zettelkasten-Inspired Agentic Memory

Core Idea

Memory Architecture

How It Works

Strengths

Weaknesses

Relevance to Kaze

2.8 Collaborative Memory — Multi-Agent Shared Memory with Access Control

Core Idea

Memory Architecture

Key Mechanisms

Strengths

Weaknesses

Relevance to Kaze

2.9 MIRIX — Six-Component Multi-Agent Memory

Core Idea

Memory Architecture

The Six Components

Meta Memory Manager

Performance

Strengths

Weaknesses

Relevance to Kaze

2.10 Memory in the Age of AI Agents — Survey Paper