System Architecture

Part of Project Kaze Architecture

1. Architectural Pattern

Kaze follows an Agent-Oriented Architecture — a hybrid pattern shaped by the fact that its primary compute units are intelligent, autonomous agents, not passive services.

Pattern	What Kaze borrows	Applied where
Actor Model	Autonomous entities with private state, message-passing	Agent runtime — each agent is an actor
Microservices	Independent deployment, own-your-data, clean API boundaries	Platform services (Gateway, Knowledge, Runtime)
Event-Driven	Loose coupling via async events	Inter-agent communication (direct calls now, NATS later)
Cell-Based	Self-contained isolated deployment units	Each tenant/deployment is a cell

New to Kaze (no traditional equivalent):

Components that learn from execution history
A governance hierarchy where AI agents can supervise other AI agents
Shared knowledge across agents while maintaining runtime isolation
A supervision ramp as a trust model (supervised → sampling → autonomous)

2. Layer Model

┌──────────────────────────────────────────────────────────┐
│                      KAZE PLATFORM                        │
│                                                           │
│  Layer 3: GOVERNANCE & SELF-IMPROVEMENT  [not yet built]  │
│  ┌────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│  │ Supervisor │ │ Quality      │ │ Improvement Agent   │ │
│  │ Agents     │ │ Monitor      │ │                     │ │
│  └────────────┘ └──────────────┘ └─────────────────────┘ │
│                                                           │
│  Layer 2: ORCHESTRATION & KNOWLEDGE                       │
│  ┌──────────────┐ ┌──────────────────────────────────┐   │
│  │ Orchestrator │ │ Knowledge System (Mem0+pgvector) │   │
│  │ Agents       │ │ Per-agent episodic memory     ✓  │   │
│  │ [not built]  │ │ Shared knowledge tiers   [plan]  │   │
│  └──────────────┘ └──────────────────────────────────┘   │
│                                                           │
│  Layer 1: EXECUTION                                    ✓  │
│  ┌────────────────────────────────────────────────────┐   │
│  │ Agent Skills (composable, reusable per vertical)   │   │
│  │ ┌────────┐ ┌────────┐ ┌───────┐ ┌──────────────┐  │   │
│  │ │ github │ │ digest │ │ seed  │ │meeting-notes │  │   │
│  │ └────────┘ └────────┘ └───────┘ └──────────────┘  │   │
│  └────────────────────────────────────────────────────┘   │
│                                                           │
│  Layer 0.5: INTERACTION                                ✓  │
│  ┌────────────────────────────────────────────────────┐   │
│  │ OpenClaw (Conversation Manager + Channel Adapters) │   │
│  │ ┌───────┐ ┌──────────┐ ┌────────┐ ┌───────┐       │   │
│  │ │ Slack │ │ WhatsApp │ │Telegram│ │ CLI   │       │   │
│  │ └───────┘ └──────────┘ └────────┘ └───────┘       │   │
│  └────────────────────────────────────────────────────┘   │
│                                                           │
│  Layer 0: PLATFORM INFRASTRUCTURE                      ✓  │
│  ┌──────────┐ ┌─────────┐ ┌──────┐ ┌───────────────┐    │
│  │ K8s      │ │Postgres │ │Vault │ │ Langfuse      │    │
│  │          │ │+pgvector│ │      │ │ (observability)│    │
│  └──────────┘ └─────────┘ └──────┘ └───────────────┘    │
│  ┌────────────────────────────────────────────────────┐   │
│  │ LLM Gateway (multi-provider, model hints, tools)   │   │
│  └────────────────────────────────────────────────────┘   │
│                                                           │
│  ALL CONTAINERIZED · ALL IaC · ANY CLOUD                  │
└──────────────────────────────────────────────────────────┘

3. Component Architecture (As Implemented)

                      ┌──────────────────────────┐
                      │  OpenClaw (Layer 0.5)      │
                      │  kaze_dispatch_task        │
                      │  kaze_list_verticals       │
                      │  kaze_agent_status         │
                      └────────────┬───────────────┘
                                   │ HTTP POST /dispatch
                                   │       or /dispatch/async
                                   ▼
┌─ kaze-runtime (port 4100) ────────────────────────────────────┐
│                                                                │
│  POST /dispatch          → sync task execution                 │
│  POST /dispatch/async    → async with callbackUrl              │
│  GET  /verticals         → list loaded verticals + skills      │
│  GET  /agents            → list active agents + status         │
│  POST /knowledge/search  → proxy to kaze-knowledge             │
│  GET  /langfuse/*        → proxy to Langfuse API               │
│  GET  /mcp/*             → proxy to Langfuse MCP sidecar       │
│                                                                │
│  VerticalAgent (long-lived, one per vertical)                  │
│    └─ SubAgent (ephemeral, one per task)                       │
│         ├─ handler mode  → TypeScript function                 │
│         ├─ prompt mode   → single-shot LLM call                │
│         └─ agentic mode  → multi-turn tool-use loop            │
│                                                                │
│  Supervision ramp: 20 runs → sampling, 50 → autonomous        │
│  3 consecutive failures → demotion                             │
│  Template engine: double-brace substitution + conditionals     │
│  Zero secrets — calls gateway + knowledge via HTTP             │
└──────┬───────────────────────────────┬────────────────────────┘
       │ HTTP                          │ HTTP
       ▼                               ▼
┌─ kaze-gateway (port 4200) ──┐  ┌─ kaze-knowledge (port 4300) ──┐
│                              │  │                                │
│ POST /llm/generate           │  │ POST /memory/search            │
│   → Vercel AI SDK            │  │   → vector similarity query    │
│   → Anthropic / Google       │  │                                │
│   → model hints:             │  │ POST /memory/add               │
│     fast → Haiku 4.5         │  │   → Mem0 LLM fact extraction   │
│     balanced → Sonnet 4.5    │  │   → embed → store              │
│     best → Opus 4.6          │  │                                │
│                              │  │ POST /memory/add-raw           │
│ POST /tools/execute          │  │   → direct embed → store       │
│   → credential injection     │  │   → skip LLM extraction        │
│   → 9 builtin tools          │  │                                │
│                              │  │ POST /memory/add-raw-batch     │
│ GET  /tools/catalog          │  │   → bulk embed (100/batch)     │
│                              │  │   → bulk insert (200/batch)    │
│ GET  /langfuse/*             │  │   → MD5 dedup                  │
│   → proxy to Langfuse API    │  │                                │
│                              │  │ Own LLM key (Gemini) for       │
│ Secrets: LLM keys, GitHub    │  │ fact extraction + embeddings   │
│ Observability: Langfuse      │  │ Storage: PostgreSQL + pgvector │
│ via OpenTelemetry            │  │ Vectors: 768-dim               │
└──────────────────────────────┘  └────────────────────────────────┘

3.1 Agent Runtime (`kaze-runtime`)

The core execution engine that manages agent lifecycles and task dispatch.

Two-layer agent model:

VerticalAgent — Long-lived, one per vertical. Loaded from vertical.yaml. Manages supervision state, spawns SubAgents for tasks.
SubAgent — Ephemeral, created per-task. Executes a single skill and terminates. Gets the vertical's capabilities, tools, and knowledge access.

Three execution modes:

Mode	When	How
handler	Skill has a TypeScript `handler` function	Direct function call — no LLM involved
prompt (single-shot)	Skill has a `prompt` template, `agentic: false`	One LLM call: render prompt → call gateway → return result
agentic (multi-turn)	Skill has `agentic: true`	Loop: LLM call → tool use → LLM call → ... until done or `maxSteps` reached

Template engine: Skill prompts use {{variable}} substitution and {{#if variable}}...{{/if}} conditionals. Variables come from the task input.

Memory integration:

Before LLM reasoning, the SubAgent searches knowledge for relevant context (via kaze-knowledge)
After task completion, key findings are stored back to knowledge

Supervision ramp (as implemented):

Start at supervised — all outputs queued for review
After 20 successful tasks → promote to sampling
After 50 successful tasks → promote to autonomous
3 consecutive failures → demote one level
Per-skill tracking, stored in-memory (resets on restart)

Async dispatch: POST /dispatch/async accepts a callbackUrl. The runtime executes the task in the background and POSTs the result to the callback URL when done.

3.2 LLM Gateway (`kaze-gateway`)

Abstraction layer between all agents and LLM providers. No agent ever holds an API key.

LLM generation:

Uses Vercel AI SDK (generateText) for unified provider interface
Model hints resolve to concrete models: fast → Haiku 4.5, balanced → Sonnet 4.5, best → Opus 4.6
JSON Schema tool definitions are converted to Zod schemas at runtime (via jsonSchemaToZod)
Langfuse tracing via OpenTelemetry span processor — every LLM call traced automatically

Tool execution:

Tools are registered in a typed registry with JSON Schema input definitions
Credentials are injected via closures at registration time — the tool function never sees raw secrets
9 builtin tools:

Tool	Description
`github_api`	GitHub REST API — issues, PRs, commits, search, labels, releases
`kaze_knowledge_search`	Search agent memory via knowledge service
`kaze_knowledge_add`	Store facts to agent memory
`kaze_knowledge_batch_add`	Bulk-store facts to agent memory
`workspace_list`	List cloned repos in workspace (git clone/pull on demand)
`workspace_read`	Read file from workspace repo
`file_glob`	Glob pattern matching on workspace files
`file_read`	Read file contents from workspace
`docling_convert`	Convert documents (PDF, DOCX, etc.) to Markdown via Docling

3.3 Knowledge Service (`kaze-knowledge`)

Persistent memory layer for all agents, built on Mem0 OSS v2.2.3.

Three ingestion paths:

Endpoint	LLM extraction	Use case
`POST /memory/add`	Yes — Mem0 extracts facts from content	Conversational context, meeting notes, rich text
`POST /memory/add-raw`	No — direct embed + store	Pre-processed facts, structured data
`POST /memory/add-raw-batch`	No — bulk embed + store	Document ingestion, seed data, migrations

Storage:

PostgreSQL + pgvector via LangChain's PGVectorStore adapter
Google embeddings always: gemini-embedding-001, 768-dimensional vectors
Batch embedding: 100 texts per batch
Batch insert: 200 records per batch
MD5 dedup on content hash in batch operations

Search:

Vector similarity search with configurable top-k
Agent-scoped queries (each agent has its own memory space)

3.4 Agent Definitions (`kaze-agent-ops`)

The Internal Ops vertical (V0) is defined entirely in YAML — no TypeScript handlers.

Vertical definition (vertical.yaml):

yaml

id: internal-ops
name: Internal Ops
model: fast
supervision: supervised
capabilities:
  - github_api
  - file_glob
  - file_read
  - kaze_knowledge_add
  - kaze_knowledge_search
skills:
  - github
  - seed
  - digest
  - triage
  - docs-sync
  - meeting-notes

Skills:

Skill	Mode	Description
`github`	prompt (single-shot)	GitHub operations — issues, PRs, CI, labels, releases
`seed`	agentic (multi-turn)	Knowledge ingestion — bulk-ingest documents via Docling
`digest`	agentic (multi-turn)	Daily/weekly digest of activity across repos
`triage`	prompt (single-shot)	Auto-triage issues — label, assign, dedup, stale PR nudge
`docs-sync`	prompt (single-shot)	Log decisions/insights to knowledge, detect contradictions
`meeting-notes`	agentic (multi-turn)	Transcript → structured minutes, GitHub issues for actions

Skill definition pattern:

yaml

id: digest
name: Daily Digest
description: Summarize recent activity across GitHub repos
inputSchema:
  type: object
  properties:
    repos:
      type: array
      items: { type: string }
    period:
      type: string
  required: [repos]
prompt: |
  Generate a concise digest of recent activity for these repos: $REPOS
  Period: $PERIOD
  Use github_api to fetch recent issues, PRs, and commits.
tools:
  - github_api
agentic: true
maxSteps: 6

4. Data Flows

4.1 Synchronous Task Dispatch

User message (Slack/WhatsApp/Telegram)
  │
  ▼
OpenClaw → kaze_dispatch_task tool
  │
  ▼
POST /dispatch {vertical, skill, input}
  │
  ▼
Runtime: VerticalAgent.dispatch()
  │
  ├─ 1. Check supervision level for this skill
  ├─ 2. Spawn SubAgent
  ├─ 3. SubAgent searches knowledge for context
  ├─ 4. SubAgent renders prompt template with input + context
  ├─ 5. SubAgent calls gateway (LLM generate + tool execute loop)
  ├─ 6. SubAgent stores learnings to knowledge
  ├─ 7. SubAgent returns result
  └─ 8. VerticalAgent updates supervision stats
  │
  ▼
Response → OpenClaw → User channel

4.2 Agentic Multi-Turn Loop

When a skill has agentic: true, the SubAgent enters a tool-use loop:

1. Render prompt with input + knowledge context
2. Call gateway POST /llm/generate with tools
3. If response has tool_calls:
   a. For each tool_call: POST /tools/execute
   b. Append tool results to conversation
   c. Call gateway again (step 2)
4. If response has no tool_calls (or maxSteps reached):
   → Return final text as result

The gateway handles tool execution within generateText — Vercel AI SDK's maxSteps parameter controls the loop on the gateway side. The runtime's agentic loop wraps this for context overflow recovery and completion detection.

4.3 Async Dispatch with Callback

POST /dispatch/async {vertical, skill, input, callbackUrl}
  │
  ▼
Runtime: returns {taskId, status: "accepted"} immediately
  │
  ▼ (background)
Runtime executes task (same flow as sync)
  │
  ▼
POST callbackUrl {taskId, status, result}

4.4 Knowledge Search → LLM → Knowledge Store

SubAgent executes:
  1. POST /knowledge/search {query: task_input, agentId}
     → receives relevant memories
  2. Inject memories into prompt context
  3. POST /llm/generate {messages: [system + memories + user], tools}
     → LLM reasons with context
  4. POST /memory/add {content: result_summary, agentId}
     → key findings stored for future tasks

5. API Contracts

Runtime API (port 4100)

Method	Path	Request	Response
POST	`/dispatch`	`{vertical, skill, input}`	`{output, metrics}`
POST	`/dispatch/async`	`{vertical, skill, input, callbackUrl}`	`{taskId, status}`
GET	`/verticals`	—	`[{id, name, skills: [...]}]`
GET	`/agents`	—	`[{vertical, status, supervision}]`
POST	`/knowledge/search`	`{query, agentId}`	`{results: [...]}` (proxied)

Gateway API (port 4200)

Method	Path	Request	Response
POST	`/llm/generate`	`{messages, model, tools?, maxSteps?}`	`{text, toolCalls?, usage}`
POST	`/tools/execute`	`{tool, input}`	`{result}`
GET	`/tools/catalog`	—	`[{name, description, inputSchema}]`

Knowledge API (port 4300)

Method	Path	Request	Response
POST	`/memory/search`	`{query, user_id, limit?}`	`{results: [{memory, score}]}`
POST	`/memory/add`	`{content, user_id, metadata?}`	`{id}`
POST	`/memory/add-raw`	`{content, user_id, metadata?}`	`{id}`
POST	`/memory/add-raw-batch`	`{memories: [{content, user_id, metadata}]}`	`{added, skipped}`

6. Design Decisions & Tradeoffs

Direct HTTP calls vs. message bus

Decision: Direct HTTP calls between components (no NATS).

Rationale: All components run in one cluster. HTTP is simple, debuggable, and sufficient for the current scale (1 cell, ~15 agents). NATS adds operational overhead with no benefit at this stage. The API interfaces are designed so swapping to NATS later changes only the transport, not the message shapes.

Vercel AI SDK vs. direct provider SDKs

Decision: Vercel AI SDK for all LLM calls.

Rationale: Unified interface across Anthropic and Google. Built-in tool execution loop (maxSteps), streaming, and structured output. Avoids maintaining provider-specific code.

Mem0 vs. custom knowledge store

Decision: Mem0 OSS for episodic memory. Custom shared knowledge tier deferred.

Rationale: Mem0 provides fact extraction, embedding, and vector storage out of the box. Good enough for per-agent memory. Shared knowledge (quality gates, ABAC, cross-agent visibility) requires custom work on top.

YAML-only skill definitions

Decision: All V0 skills are YAML-only (prompt-based). No TypeScript handlers used.

Rationale: Prompts are the simplest skill definition — just a template and a list of tools. The runtime's agentic loop handles multi-turn execution. TypeScript handlers exist for cases that need procedural logic, but haven't been needed yet.

Zero-secret runtime

Decision: The runtime holds no API keys. All secrets live in the gateway.

Rationale: Reduces blast radius. If the runtime is compromised, the attacker cannot call LLM providers or external APIs directly. Credential injection happens via closures in the gateway — the tool function receives credentials as arguments, not from environment variables.

7. What's Not Yet Built

Component	Status	Notes
Task Scheduler	Not built	Cron + event triggers. Using manual dispatch only.
Observation Logger	Using Langfuse	No standalone logger. Langfuse provides tracing, not structured event storage.
Shared knowledge tier	Not built	Quality gates, ABAC, cross-agent knowledge sharing.
Orchestrator agents	Not built	Dynamic task decomposition, multi-agent coordination.
Supervisor agents	Not built	Layer 3 governance, fleet health monitoring.
NATS messaging	Not built	Using direct HTTP calls.
Budget tracking	Not built	Token usage logged in Langfuse but not enforced.
Additional tools	Not built	Calendar, SEMrush, Google Search Console, Toddle DB.
V1 SEO vertical	Not built	Agent definitions and tool integrations pending.
V2 Toddle vertical	Not built	Agent definitions and tool integrations pending.

System Architecture ​

1. Architectural Pattern ​

2. Layer Model ​

3. Component Architecture (As Implemented) ​

3.1 Agent Runtime (kaze-runtime) ​

3.2 LLM Gateway (kaze-gateway) ​

3.3 Knowledge Service (kaze-knowledge) ​

3.4 Agent Definitions (kaze-agent-ops) ​

4. Data Flows ​

4.1 Synchronous Task Dispatch ​

4.2 Agentic Multi-Turn Loop ​

4.3 Async Dispatch with Callback ​

4.4 Knowledge Search → LLM → Knowledge Store ​

5. API Contracts ​

Runtime API (port 4100) ​

Gateway API (port 4200) ​

Knowledge API (port 4300) ​

6. Design Decisions & Tradeoffs ​

Direct HTTP calls vs. message bus ​

Vercel AI SDK vs. direct provider SDKs ​

Mem0 vs. custom knowledge store ​

YAML-only skill definitions ​

Zero-secret runtime ​

7. What's Not Yet Built ​