System Architecture
Part of Project Kaze Architecture
1. Architectural Pattern
Kaze follows an Agent-Oriented Architecture — a hybrid pattern shaped by the fact that its primary compute units are intelligent, autonomous agents, not passive services.
| Pattern | What Kaze borrows | Applied where |
|---|---|---|
| Actor Model | Autonomous entities with private state, message-passing | Agent runtime — each agent is an actor |
| Microservices | Independent deployment, own-your-data, clean API boundaries | Platform services (Gateway, Knowledge, Runtime) |
| Event-Driven | Loose coupling via async events | Inter-agent communication (direct calls now, NATS later) |
| Cell-Based | Self-contained isolated deployment units | Each tenant/deployment is a cell |
New to Kaze (no traditional equivalent):
- Components that learn from execution history
- A governance hierarchy where AI agents can supervise other AI agents
- Shared knowledge across agents while maintaining runtime isolation
- A supervision ramp as a trust model (supervised → sampling → autonomous)
2. Layer Model
┌──────────────────────────────────────────────────────────┐
│ KAZE PLATFORM │
│ │
│ Layer 3: GOVERNANCE & SELF-IMPROVEMENT [not yet built] │
│ ┌────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Supervisor │ │ Quality │ │ Improvement Agent │ │
│ │ Agents │ │ Monitor │ │ │ │
│ └────────────┘ └──────────────┘ └─────────────────────┘ │
│ │
│ Layer 2: ORCHESTRATION & KNOWLEDGE │
│ ┌──────────────┐ ┌──────────────────────────────────┐ │
│ │ Orchestrator │ │ Knowledge System (Mem0+pgvector) │ │
│ │ Agents │ │ Per-agent episodic memory ✓ │ │
│ │ [not built] │ │ Shared knowledge tiers [plan] │ │
│ └──────────────┘ └──────────────────────────────────┘ │
│ │
│ Layer 1: EXECUTION ✓ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Agent Skills (composable, reusable per vertical) │ │
│ │ ┌────────┐ ┌────────┐ ┌───────┐ ┌──────────────┐ │ │
│ │ │ github │ │ digest │ │ seed │ │meeting-notes │ │ │
│ │ └────────┘ └────────┘ └───────┘ └──────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ Layer 0.5: INTERACTION ✓ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ OpenClaw (Conversation Manager + Channel Adapters) │ │
│ │ ┌───────┐ ┌──────────┐ ┌────────┐ ┌───────┐ │ │
│ │ │ Slack │ │ WhatsApp │ │Telegram│ │ CLI │ │ │
│ │ └───────┘ └──────────┘ └────────┘ └───────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ Layer 0: PLATFORM INFRASTRUCTURE ✓ │
│ ┌──────────┐ ┌─────────┐ ┌──────┐ ┌───────────────┐ │
│ │ K8s │ │Postgres │ │Vault │ │ Langfuse │ │
│ │ │ │+pgvector│ │ │ │ (observability)│ │
│ └──────────┘ └─────────┘ └──────┘ └───────────────┘ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ LLM Gateway (multi-provider, model hints, tools) │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ALL CONTAINERIZED · ALL IaC · ANY CLOUD │
└──────────────────────────────────────────────────────────┘3. Component Architecture (As Implemented)
┌──────────────────────────┐
│ OpenClaw (Layer 0.5) │
│ kaze_dispatch_task │
│ kaze_list_verticals │
│ kaze_agent_status │
└────────────┬───────────────┘
│ HTTP POST /dispatch
│ or /dispatch/async
▼
┌─ kaze-runtime (port 4100) ────────────────────────────────────┐
│ │
│ POST /dispatch → sync task execution │
│ POST /dispatch/async → async with callbackUrl │
│ GET /verticals → list loaded verticals + skills │
│ GET /agents → list active agents + status │
│ POST /knowledge/search → proxy to kaze-knowledge │
│ GET /langfuse/* → proxy to Langfuse API │
│ GET /mcp/* → proxy to Langfuse MCP sidecar │
│ │
│ VerticalAgent (long-lived, one per vertical) │
│ └─ SubAgent (ephemeral, one per task) │
│ ├─ handler mode → TypeScript function │
│ ├─ prompt mode → single-shot LLM call │
│ └─ agentic mode → multi-turn tool-use loop │
│ │
│ Supervision ramp: 20 runs → sampling, 50 → autonomous │
│ 3 consecutive failures → demotion │
│ Template engine: double-brace substitution + conditionals │
│ Zero secrets — calls gateway + knowledge via HTTP │
└──────┬───────────────────────────────┬────────────────────────┘
│ HTTP │ HTTP
▼ ▼
┌─ kaze-gateway (port 4200) ──┐ ┌─ kaze-knowledge (port 4300) ──┐
│ │ │ │
│ POST /llm/generate │ │ POST /memory/search │
│ → Vercel AI SDK │ │ → vector similarity query │
│ → Anthropic / Google │ │ │
│ → model hints: │ │ POST /memory/add │
│ fast → Haiku 4.5 │ │ → Mem0 LLM fact extraction │
│ balanced → Sonnet 4.5 │ │ → embed → store │
│ best → Opus 4.6 │ │ │
│ │ │ POST /memory/add-raw │
│ POST /tools/execute │ │ → direct embed → store │
│ → credential injection │ │ → skip LLM extraction │
│ → 9 builtin tools │ │ │
│ │ │ POST /memory/add-raw-batch │
│ GET /tools/catalog │ │ → bulk embed (100/batch) │
│ │ │ → bulk insert (200/batch) │
│ GET /langfuse/* │ │ → MD5 dedup │
│ → proxy to Langfuse API │ │ │
│ │ │ Own LLM key (Gemini) for │
│ Secrets: LLM keys, GitHub │ │ fact extraction + embeddings │
│ Observability: Langfuse │ │ Storage: PostgreSQL + pgvector │
│ via OpenTelemetry │ │ Vectors: 768-dim │
└──────────────────────────────┘ └────────────────────────────────┘3.1 Agent Runtime (kaze-runtime)
The core execution engine that manages agent lifecycles and task dispatch.
Two-layer agent model:
- VerticalAgent — Long-lived, one per vertical. Loaded from
vertical.yaml. Manages supervision state, spawns SubAgents for tasks. - SubAgent — Ephemeral, created per-task. Executes a single skill and terminates. Gets the vertical's capabilities, tools, and knowledge access.
Three execution modes:
| Mode | When | How |
|---|---|---|
| handler | Skill has a TypeScript handler function | Direct function call — no LLM involved |
| prompt (single-shot) | Skill has a prompt template, agentic: false | One LLM call: render prompt → call gateway → return result |
| agentic (multi-turn) | Skill has agentic: true | Loop: LLM call → tool use → LLM call → ... until done or maxSteps reached |
Template engine: Skill prompts use {{variable}} substitution and {{#if variable}}...{{/if}} conditionals. Variables come from the task input.
Memory integration:
- Before LLM reasoning, the SubAgent searches knowledge for relevant context (via
kaze-knowledge) - After task completion, key findings are stored back to knowledge
Supervision ramp (as implemented):
- Start at
supervised— all outputs queued for review - After 20 successful tasks → promote to
sampling - After 50 successful tasks → promote to
autonomous - 3 consecutive failures → demote one level
- Per-skill tracking, stored in-memory (resets on restart)
Async dispatch: POST /dispatch/async accepts a callbackUrl. The runtime executes the task in the background and POSTs the result to the callback URL when done.
3.2 LLM Gateway (kaze-gateway)
Abstraction layer between all agents and LLM providers. No agent ever holds an API key.
LLM generation:
- Uses Vercel AI SDK (
generateText) for unified provider interface - Model hints resolve to concrete models:
fast→ Haiku 4.5,balanced→ Sonnet 4.5,best→ Opus 4.6 - JSON Schema tool definitions are converted to Zod schemas at runtime (via
jsonSchemaToZod) - Langfuse tracing via OpenTelemetry span processor — every LLM call traced automatically
Tool execution:
- Tools are registered in a typed registry with JSON Schema input definitions
- Credentials are injected via closures at registration time — the tool function never sees raw secrets
- 9 builtin tools:
| Tool | Description |
|---|---|
github_api | GitHub REST API — issues, PRs, commits, search, labels, releases |
kaze_knowledge_search | Search agent memory via knowledge service |
kaze_knowledge_add | Store facts to agent memory |
kaze_knowledge_batch_add | Bulk-store facts to agent memory |
workspace_list | List cloned repos in workspace (git clone/pull on demand) |
workspace_read | Read file from workspace repo |
file_glob | Glob pattern matching on workspace files |
file_read | Read file contents from workspace |
docling_convert | Convert documents (PDF, DOCX, etc.) to Markdown via Docling |
3.3 Knowledge Service (kaze-knowledge)
Persistent memory layer for all agents, built on Mem0 OSS v2.2.3.
Three ingestion paths:
| Endpoint | LLM extraction | Use case |
|---|---|---|
POST /memory/add | Yes — Mem0 extracts facts from content | Conversational context, meeting notes, rich text |
POST /memory/add-raw | No — direct embed + store | Pre-processed facts, structured data |
POST /memory/add-raw-batch | No — bulk embed + store | Document ingestion, seed data, migrations |
Storage:
- PostgreSQL + pgvector via LangChain's
PGVectorStoreadapter - Google embeddings always:
gemini-embedding-001, 768-dimensional vectors - Batch embedding: 100 texts per batch
- Batch insert: 200 records per batch
- MD5 dedup on content hash in batch operations
Search:
- Vector similarity search with configurable top-k
- Agent-scoped queries (each agent has its own memory space)
3.4 Agent Definitions (kaze-agent-ops)
The Internal Ops vertical (V0) is defined entirely in YAML — no TypeScript handlers.
Vertical definition (vertical.yaml):
id: internal-ops
name: Internal Ops
model: fast
supervision: supervised
capabilities:
- github_api
- file_glob
- file_read
- kaze_knowledge_add
- kaze_knowledge_search
skills:
- github
- seed
- digest
- triage
- docs-sync
- meeting-notesSkills:
| Skill | Mode | Description |
|---|---|---|
github | prompt (single-shot) | GitHub operations — issues, PRs, CI, labels, releases |
seed | agentic (multi-turn) | Knowledge ingestion — bulk-ingest documents via Docling |
digest | agentic (multi-turn) | Daily/weekly digest of activity across repos |
triage | prompt (single-shot) | Auto-triage issues — label, assign, dedup, stale PR nudge |
docs-sync | prompt (single-shot) | Log decisions/insights to knowledge, detect contradictions |
meeting-notes | agentic (multi-turn) | Transcript → structured minutes, GitHub issues for actions |
Skill definition pattern:
id: digest
name: Daily Digest
description: Summarize recent activity across GitHub repos
inputSchema:
type: object
properties:
repos:
type: array
items: { type: string }
period:
type: string
required: [repos]
prompt: |
Generate a concise digest of recent activity for these repos: $REPOS
Period: $PERIOD
Use github_api to fetch recent issues, PRs, and commits.
tools:
- github_api
agentic: true
maxSteps: 64. Data Flows
4.1 Synchronous Task Dispatch
User message (Slack/WhatsApp/Telegram)
│
▼
OpenClaw → kaze_dispatch_task tool
│
▼
POST /dispatch {vertical, skill, input}
│
▼
Runtime: VerticalAgent.dispatch()
│
├─ 1. Check supervision level for this skill
├─ 2. Spawn SubAgent
├─ 3. SubAgent searches knowledge for context
├─ 4. SubAgent renders prompt template with input + context
├─ 5. SubAgent calls gateway (LLM generate + tool execute loop)
├─ 6. SubAgent stores learnings to knowledge
├─ 7. SubAgent returns result
└─ 8. VerticalAgent updates supervision stats
│
▼
Response → OpenClaw → User channel4.2 Agentic Multi-Turn Loop
When a skill has agentic: true, the SubAgent enters a tool-use loop:
1. Render prompt with input + knowledge context
2. Call gateway POST /llm/generate with tools
3. If response has tool_calls:
a. For each tool_call: POST /tools/execute
b. Append tool results to conversation
c. Call gateway again (step 2)
4. If response has no tool_calls (or maxSteps reached):
→ Return final text as resultThe gateway handles tool execution within generateText — Vercel AI SDK's maxSteps parameter controls the loop on the gateway side. The runtime's agentic loop wraps this for context overflow recovery and completion detection.
4.3 Async Dispatch with Callback
POST /dispatch/async {vertical, skill, input, callbackUrl}
│
▼
Runtime: returns {taskId, status: "accepted"} immediately
│
▼ (background)
Runtime executes task (same flow as sync)
│
▼
POST callbackUrl {taskId, status, result}4.4 Knowledge Search → LLM → Knowledge Store
SubAgent executes:
1. POST /knowledge/search {query: task_input, agentId}
→ receives relevant memories
2. Inject memories into prompt context
3. POST /llm/generate {messages: [system + memories + user], tools}
→ LLM reasons with context
4. POST /memory/add {content: result_summary, agentId}
→ key findings stored for future tasks5. API Contracts
Runtime API (port 4100)
| Method | Path | Request | Response |
|---|---|---|---|
| POST | /dispatch | {vertical, skill, input} | {output, metrics} |
| POST | /dispatch/async | {vertical, skill, input, callbackUrl} | {taskId, status} |
| GET | /verticals | — | [{id, name, skills: [...]}] |
| GET | /agents | — | [{vertical, status, supervision}] |
| POST | /knowledge/search | {query, agentId} | {results: [...]} (proxied) |
Gateway API (port 4200)
| Method | Path | Request | Response |
|---|---|---|---|
| POST | /llm/generate | {messages, model, tools?, maxSteps?} | {text, toolCalls?, usage} |
| POST | /tools/execute | {tool, input} | {result} |
| GET | /tools/catalog | — | [{name, description, inputSchema}] |
Knowledge API (port 4300)
| Method | Path | Request | Response |
|---|---|---|---|
| POST | /memory/search | {query, user_id, limit?} | {results: [{memory, score}]} |
| POST | /memory/add | {content, user_id, metadata?} | {id} |
| POST | /memory/add-raw | {content, user_id, metadata?} | {id} |
| POST | /memory/add-raw-batch | {memories: [{content, user_id, metadata}]} | {added, skipped} |
6. Design Decisions & Tradeoffs
Direct HTTP calls vs. message bus
Decision: Direct HTTP calls between components (no NATS).
Rationale: All components run in one cluster. HTTP is simple, debuggable, and sufficient for the current scale (1 cell, ~15 agents). NATS adds operational overhead with no benefit at this stage. The API interfaces are designed so swapping to NATS later changes only the transport, not the message shapes.
Vercel AI SDK vs. direct provider SDKs
Decision: Vercel AI SDK for all LLM calls.
Rationale: Unified interface across Anthropic and Google. Built-in tool execution loop (maxSteps), streaming, and structured output. Avoids maintaining provider-specific code.
Mem0 vs. custom knowledge store
Decision: Mem0 OSS for episodic memory. Custom shared knowledge tier deferred.
Rationale: Mem0 provides fact extraction, embedding, and vector storage out of the box. Good enough for per-agent memory. Shared knowledge (quality gates, ABAC, cross-agent visibility) requires custom work on top.
YAML-only skill definitions
Decision: All V0 skills are YAML-only (prompt-based). No TypeScript handlers used.
Rationale: Prompts are the simplest skill definition — just a template and a list of tools. The runtime's agentic loop handles multi-turn execution. TypeScript handlers exist for cases that need procedural logic, but haven't been needed yet.
Zero-secret runtime
Decision: The runtime holds no API keys. All secrets live in the gateway.
Rationale: Reduces blast radius. If the runtime is compromised, the attacker cannot call LLM providers or external APIs directly. Credential injection happens via closures in the gateway — the tool function receives credentials as arguments, not from environment variables.
7. What's Not Yet Built
| Component | Status | Notes |
|---|---|---|
| Task Scheduler | Not built | Cron + event triggers. Using manual dispatch only. |
| Observation Logger | Using Langfuse | No standalone logger. Langfuse provides tracing, not structured event storage. |
| Shared knowledge tier | Not built | Quality gates, ABAC, cross-agent knowledge sharing. |
| Orchestrator agents | Not built | Dynamic task decomposition, multi-agent coordination. |
| Supervisor agents | Not built | Layer 3 governance, fleet health monitoring. |
| NATS messaging | Not built | Using direct HTTP calls. |
| Budget tracking | Not built | Token usage logged in Langfuse but not enforced. |
| Additional tools | Not built | Calendar, SEMrush, Google Search Console, Toddle DB. |
| V1 SEO vertical | Not built | Agent definitions and tool integrations pending. |
| V2 Toddle vertical | Not built | Agent definitions and tool integrations pending. |