Skip to content

System Architecture

Part of Project Kaze Architecture


1. Architectural Pattern

Kaze follows an Agent-Oriented Architecture — a hybrid pattern shaped by the fact that its primary compute units are intelligent, autonomous agents, not passive services.

PatternWhat Kaze borrowsApplied where
Actor ModelAutonomous entities with private state, message-passingAgent runtime — each agent is an actor
MicroservicesIndependent deployment, own-your-data, clean API boundariesPlatform services (Gateway, Knowledge, Runtime)
Event-DrivenLoose coupling via async eventsInter-agent communication (direct calls now, NATS later)
Cell-BasedSelf-contained isolated deployment unitsEach tenant/deployment is a cell

New to Kaze (no traditional equivalent):

  • Components that learn from execution history
  • A governance hierarchy where AI agents can supervise other AI agents
  • Shared knowledge across agents while maintaining runtime isolation
  • A supervision ramp as a trust model (supervised → sampling → autonomous)

2. Layer Model

┌──────────────────────────────────────────────────────────┐
│                      KAZE PLATFORM                        │
│                                                           │
│  Layer 3: GOVERNANCE & SELF-IMPROVEMENT  [not yet built]  │
│  ┌────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│  │ Supervisor │ │ Quality      │ │ Improvement Agent   │ │
│  │ Agents     │ │ Monitor      │ │                     │ │
│  └────────────┘ └──────────────┘ └─────────────────────┘ │
│                                                           │
│  Layer 2: ORCHESTRATION & KNOWLEDGE                       │
│  ┌──────────────┐ ┌──────────────────────────────────┐   │
│  │ Orchestrator │ │ Knowledge System (Mem0+pgvector) │   │
│  │ Agents       │ │ Per-agent episodic memory     ✓  │   │
│  │ [not built]  │ │ Shared knowledge tiers   [plan]  │   │
│  └──────────────┘ └──────────────────────────────────┘   │
│                                                           │
│  Layer 1: EXECUTION                                    ✓  │
│  ┌────────────────────────────────────────────────────┐   │
│  │ Agent Skills (composable, reusable per vertical)   │   │
│  │ ┌────────┐ ┌────────┐ ┌───────┐ ┌──────────────┐  │   │
│  │ │ github │ │ digest │ │ seed  │ │meeting-notes │  │   │
│  │ └────────┘ └────────┘ └───────┘ └──────────────┘  │   │
│  └────────────────────────────────────────────────────┘   │
│                                                           │
│  Layer 0.5: INTERACTION                                ✓  │
│  ┌────────────────────────────────────────────────────┐   │
│  │ OpenClaw (Conversation Manager + Channel Adapters) │   │
│  │ ┌───────┐ ┌──────────┐ ┌────────┐ ┌───────┐       │   │
│  │ │ Slack │ │ WhatsApp │ │Telegram│ │ CLI   │       │   │
│  │ └───────┘ └──────────┘ └────────┘ └───────┘       │   │
│  └────────────────────────────────────────────────────┘   │
│                                                           │
│  Layer 0: PLATFORM INFRASTRUCTURE                      ✓  │
│  ┌──────────┐ ┌─────────┐ ┌──────┐ ┌───────────────┐    │
│  │ K8s      │ │Postgres │ │Vault │ │ Langfuse      │    │
│  │          │ │+pgvector│ │      │ │ (observability)│    │
│  └──────────┘ └─────────┘ └──────┘ └───────────────┘    │
│  ┌────────────────────────────────────────────────────┐   │
│  │ LLM Gateway (multi-provider, model hints, tools)   │   │
│  └────────────────────────────────────────────────────┘   │
│                                                           │
│  ALL CONTAINERIZED · ALL IaC · ANY CLOUD                  │
└──────────────────────────────────────────────────────────┘

3. Component Architecture (As Implemented)

                      ┌──────────────────────────┐
                      │  OpenClaw (Layer 0.5)      │
                      │  kaze_dispatch_task        │
                      │  kaze_list_verticals       │
                      │  kaze_agent_status         │
                      └────────────┬───────────────┘
                                   │ HTTP POST /dispatch
                                   │       or /dispatch/async

┌─ kaze-runtime (port 4100) ────────────────────────────────────┐
│                                                                │
│  POST /dispatch          → sync task execution                 │
│  POST /dispatch/async    → async with callbackUrl              │
│  GET  /verticals         → list loaded verticals + skills      │
│  GET  /agents            → list active agents + status         │
│  POST /knowledge/search  → proxy to kaze-knowledge             │
│  GET  /langfuse/*        → proxy to Langfuse API               │
│  GET  /mcp/*             → proxy to Langfuse MCP sidecar       │
│                                                                │
│  VerticalAgent (long-lived, one per vertical)                  │
│    └─ SubAgent (ephemeral, one per task)                       │
│         ├─ handler mode  → TypeScript function                 │
│         ├─ prompt mode   → single-shot LLM call                │
│         └─ agentic mode  → multi-turn tool-use loop            │
│                                                                │
│  Supervision ramp: 20 runs → sampling, 50 → autonomous        │
│  3 consecutive failures → demotion                             │
│  Template engine: double-brace substitution + conditionals     │
│  Zero secrets — calls gateway + knowledge via HTTP             │
└──────┬───────────────────────────────┬────────────────────────┘
       │ HTTP                          │ HTTP
       ▼                               ▼
┌─ kaze-gateway (port 4200) ──┐  ┌─ kaze-knowledge (port 4300) ──┐
│                              │  │                                │
│ POST /llm/generate           │  │ POST /memory/search            │
│   → Vercel AI SDK            │  │   → vector similarity query    │
│   → Anthropic / Google       │  │                                │
│   → model hints:             │  │ POST /memory/add               │
│     fast → Haiku 4.5         │  │   → Mem0 LLM fact extraction   │
│     balanced → Sonnet 4.5    │  │   → embed → store              │
│     best → Opus 4.6          │  │                                │
│                              │  │ POST /memory/add-raw           │
│ POST /tools/execute          │  │   → direct embed → store       │
│   → credential injection     │  │   → skip LLM extraction        │
│   → 9 builtin tools          │  │                                │
│                              │  │ POST /memory/add-raw-batch     │
│ GET  /tools/catalog          │  │   → bulk embed (100/batch)     │
│                              │  │   → bulk insert (200/batch)    │
│ GET  /langfuse/*             │  │   → MD5 dedup                  │
│   → proxy to Langfuse API    │  │                                │
│                              │  │ Own LLM key (Gemini) for       │
│ Secrets: LLM keys, GitHub    │  │ fact extraction + embeddings   │
│ Observability: Langfuse      │  │ Storage: PostgreSQL + pgvector │
│ via OpenTelemetry            │  │ Vectors: 768-dim               │
└──────────────────────────────┘  └────────────────────────────────┘

3.1 Agent Runtime (kaze-runtime)

The core execution engine that manages agent lifecycles and task dispatch.

Two-layer agent model:

  • VerticalAgent — Long-lived, one per vertical. Loaded from vertical.yaml. Manages supervision state, spawns SubAgents for tasks.
  • SubAgent — Ephemeral, created per-task. Executes a single skill and terminates. Gets the vertical's capabilities, tools, and knowledge access.

Three execution modes:

ModeWhenHow
handlerSkill has a TypeScript handler functionDirect function call — no LLM involved
prompt (single-shot)Skill has a prompt template, agentic: falseOne LLM call: render prompt → call gateway → return result
agentic (multi-turn)Skill has agentic: trueLoop: LLM call → tool use → LLM call → ... until done or maxSteps reached

Template engine: Skill prompts use {{variable}} substitution and {{#if variable}}...{{/if}} conditionals. Variables come from the task input.

Memory integration:

  • Before LLM reasoning, the SubAgent searches knowledge for relevant context (via kaze-knowledge)
  • After task completion, key findings are stored back to knowledge

Supervision ramp (as implemented):

  • Start at supervised — all outputs queued for review
  • After 20 successful tasks → promote to sampling
  • After 50 successful tasks → promote to autonomous
  • 3 consecutive failures → demote one level
  • Per-skill tracking, stored in-memory (resets on restart)

Async dispatch: POST /dispatch/async accepts a callbackUrl. The runtime executes the task in the background and POSTs the result to the callback URL when done.

3.2 LLM Gateway (kaze-gateway)

Abstraction layer between all agents and LLM providers. No agent ever holds an API key.

LLM generation:

  • Uses Vercel AI SDK (generateText) for unified provider interface
  • Model hints resolve to concrete models: fast → Haiku 4.5, balanced → Sonnet 4.5, best → Opus 4.6
  • JSON Schema tool definitions are converted to Zod schemas at runtime (via jsonSchemaToZod)
  • Langfuse tracing via OpenTelemetry span processor — every LLM call traced automatically

Tool execution:

  • Tools are registered in a typed registry with JSON Schema input definitions
  • Credentials are injected via closures at registration time — the tool function never sees raw secrets
  • 9 builtin tools:
ToolDescription
github_apiGitHub REST API — issues, PRs, commits, search, labels, releases
kaze_knowledge_searchSearch agent memory via knowledge service
kaze_knowledge_addStore facts to agent memory
kaze_knowledge_batch_addBulk-store facts to agent memory
workspace_listList cloned repos in workspace (git clone/pull on demand)
workspace_readRead file from workspace repo
file_globGlob pattern matching on workspace files
file_readRead file contents from workspace
docling_convertConvert documents (PDF, DOCX, etc.) to Markdown via Docling

3.3 Knowledge Service (kaze-knowledge)

Persistent memory layer for all agents, built on Mem0 OSS v2.2.3.

Three ingestion paths:

EndpointLLM extractionUse case
POST /memory/addYes — Mem0 extracts facts from contentConversational context, meeting notes, rich text
POST /memory/add-rawNo — direct embed + storePre-processed facts, structured data
POST /memory/add-raw-batchNo — bulk embed + storeDocument ingestion, seed data, migrations

Storage:

  • PostgreSQL + pgvector via LangChain's PGVectorStore adapter
  • Google embeddings always: gemini-embedding-001, 768-dimensional vectors
  • Batch embedding: 100 texts per batch
  • Batch insert: 200 records per batch
  • MD5 dedup on content hash in batch operations

Search:

  • Vector similarity search with configurable top-k
  • Agent-scoped queries (each agent has its own memory space)

3.4 Agent Definitions (kaze-agent-ops)

The Internal Ops vertical (V0) is defined entirely in YAML — no TypeScript handlers.

Vertical definition (vertical.yaml):

yaml
id: internal-ops
name: Internal Ops
model: fast
supervision: supervised
capabilities:
  - github_api
  - file_glob
  - file_read
  - kaze_knowledge_add
  - kaze_knowledge_search
skills:
  - github
  - seed
  - digest
  - triage
  - docs-sync
  - meeting-notes

Skills:

SkillModeDescription
githubprompt (single-shot)GitHub operations — issues, PRs, CI, labels, releases
seedagentic (multi-turn)Knowledge ingestion — bulk-ingest documents via Docling
digestagentic (multi-turn)Daily/weekly digest of activity across repos
triageprompt (single-shot)Auto-triage issues — label, assign, dedup, stale PR nudge
docs-syncprompt (single-shot)Log decisions/insights to knowledge, detect contradictions
meeting-notesagentic (multi-turn)Transcript → structured minutes, GitHub issues for actions

Skill definition pattern:

yaml
id: digest
name: Daily Digest
description: Summarize recent activity across GitHub repos
inputSchema:
  type: object
  properties:
    repos:
      type: array
      items: { type: string }
    period:
      type: string
  required: [repos]
prompt: |
  Generate a concise digest of recent activity for these repos: $REPOS
  Period: $PERIOD
  Use github_api to fetch recent issues, PRs, and commits.
tools:
  - github_api
agentic: true
maxSteps: 6

4. Data Flows

4.1 Synchronous Task Dispatch

User message (Slack/WhatsApp/Telegram)


OpenClaw → kaze_dispatch_task tool


POST /dispatch {vertical, skill, input}


Runtime: VerticalAgent.dispatch()

  ├─ 1. Check supervision level for this skill
  ├─ 2. Spawn SubAgent
  ├─ 3. SubAgent searches knowledge for context
  ├─ 4. SubAgent renders prompt template with input + context
  ├─ 5. SubAgent calls gateway (LLM generate + tool execute loop)
  ├─ 6. SubAgent stores learnings to knowledge
  ├─ 7. SubAgent returns result
  └─ 8. VerticalAgent updates supervision stats


Response → OpenClaw → User channel

4.2 Agentic Multi-Turn Loop

When a skill has agentic: true, the SubAgent enters a tool-use loop:

1. Render prompt with input + knowledge context
2. Call gateway POST /llm/generate with tools
3. If response has tool_calls:
   a. For each tool_call: POST /tools/execute
   b. Append tool results to conversation
   c. Call gateway again (step 2)
4. If response has no tool_calls (or maxSteps reached):
   → Return final text as result

The gateway handles tool execution within generateText — Vercel AI SDK's maxSteps parameter controls the loop on the gateway side. The runtime's agentic loop wraps this for context overflow recovery and completion detection.

4.3 Async Dispatch with Callback

POST /dispatch/async {vertical, skill, input, callbackUrl}


Runtime: returns {taskId, status: "accepted"} immediately

  ▼ (background)
Runtime executes task (same flow as sync)


POST callbackUrl {taskId, status, result}

4.4 Knowledge Search → LLM → Knowledge Store

SubAgent executes:
  1. POST /knowledge/search {query: task_input, agentId}
     → receives relevant memories
  2. Inject memories into prompt context
  3. POST /llm/generate {messages: [system + memories + user], tools}
     → LLM reasons with context
  4. POST /memory/add {content: result_summary, agentId}
     → key findings stored for future tasks

5. API Contracts

Runtime API (port 4100)

MethodPathRequestResponse
POST/dispatch{vertical, skill, input}{output, metrics}
POST/dispatch/async{vertical, skill, input, callbackUrl}{taskId, status}
GET/verticals[{id, name, skills: [...]}]
GET/agents[{vertical, status, supervision}]
POST/knowledge/search{query, agentId}{results: [...]} (proxied)

Gateway API (port 4200)

MethodPathRequestResponse
POST/llm/generate{messages, model, tools?, maxSteps?}{text, toolCalls?, usage}
POST/tools/execute{tool, input}{result}
GET/tools/catalog[{name, description, inputSchema}]

Knowledge API (port 4300)

MethodPathRequestResponse
POST/memory/search{query, user_id, limit?}{results: [{memory, score}]}
POST/memory/add{content, user_id, metadata?}{id}
POST/memory/add-raw{content, user_id, metadata?}{id}
POST/memory/add-raw-batch{memories: [{content, user_id, metadata}]}{added, skipped}

6. Design Decisions & Tradeoffs

Direct HTTP calls vs. message bus

Decision: Direct HTTP calls between components (no NATS).

Rationale: All components run in one cluster. HTTP is simple, debuggable, and sufficient for the current scale (1 cell, ~15 agents). NATS adds operational overhead with no benefit at this stage. The API interfaces are designed so swapping to NATS later changes only the transport, not the message shapes.

Vercel AI SDK vs. direct provider SDKs

Decision: Vercel AI SDK for all LLM calls.

Rationale: Unified interface across Anthropic and Google. Built-in tool execution loop (maxSteps), streaming, and structured output. Avoids maintaining provider-specific code.

Mem0 vs. custom knowledge store

Decision: Mem0 OSS for episodic memory. Custom shared knowledge tier deferred.

Rationale: Mem0 provides fact extraction, embedding, and vector storage out of the box. Good enough for per-agent memory. Shared knowledge (quality gates, ABAC, cross-agent visibility) requires custom work on top.

YAML-only skill definitions

Decision: All V0 skills are YAML-only (prompt-based). No TypeScript handlers used.

Rationale: Prompts are the simplest skill definition — just a template and a list of tools. The runtime's agentic loop handles multi-turn execution. TypeScript handlers exist for cases that need procedural logic, but haven't been needed yet.

Zero-secret runtime

Decision: The runtime holds no API keys. All secrets live in the gateway.

Rationale: Reduces blast radius. If the runtime is compromised, the attacker cannot call LLM providers or external APIs directly. Credential injection happens via closures in the gateway — the tool function receives credentials as arguments, not from environment variables.


7. What's Not Yet Built

ComponentStatusNotes
Task SchedulerNot builtCron + event triggers. Using manual dispatch only.
Observation LoggerUsing LangfuseNo standalone logger. Langfuse provides tracing, not structured event storage.
Shared knowledge tierNot builtQuality gates, ABAC, cross-agent knowledge sharing.
Orchestrator agentsNot builtDynamic task decomposition, multi-agent coordination.
Supervisor agentsNot builtLayer 3 governance, fleet health monitoring.
NATS messagingNot builtUsing direct HTTP calls.
Budget trackingNot builtToken usage logged in Langfuse but not enforced.
Additional toolsNot builtCalendar, SEMrush, Google Search Console, Toddle DB.
V1 SEO verticalNot builtAgent definitions and tool integrations pending.
V2 Toddle verticalNot builtAgent definitions and tool integrations pending.