Technical Design — Component Overview
Part of Project Kaze Architecture
High-level design for the 6 MVP platform components. Covers what each component does, its inputs/outputs, how components connect, and key workflows.
Implementation status: Components 1-3 are implemented. See plan.md for details.
1. Agent Runtime
Implemented —
kaze-runtime(port 4100)
The core execution engine that manages agent lifecycles and task execution.
What It Does
- Loads agent and skill definitions (YAML + optional TypeScript handlers)
- Spawns agent instances bound to a tenant
- Dispatches tasks to agents (one task at a time per agent, actor model)
- Manages agent lifecycle: initializing → ready → executing → idle → shutdown
- Enforces per-skill supervision levels (supervised / sampling / autonomous)
- Routes inter-agent communication (direct calls in MVP, NATS in Phase 2)
Inputs / Outputs
| Input | Output |
|---|---|
| Skill definition (YAML) | Loaded, validated skill ready for composition |
| Agent definition (YAML) | Running agent instance bound to a tenant |
| Task request (skill name + input data + initiator) | Task result (output data + metrics + approval status) |
Key Workflows
Agent Spawn:
Load agent YAML → Resolve skill definitions → Load TS handlers (if any)
→ Connect to LLM Gateway, Knowledge, Tools → Set state = readyTask Execution:
Receive task → Check supervision level for this skill
→ If supervised: execute → queue output for human review → wait for approval
→ If sampling: execute → randomly sample X% for review → deliver rest immediately
→ If autonomous: execute → deliver
→ Log everything to Observation Logger
→ Update supervision ramp statisticsSupervision Ramp:
Track per-skill stats (success rate, approval rate, total runs)
→ When thresholds met (e.g., 50 runs at 95% approval) → promote to next level
→ If quality drops → demote backConnections
- Uses: LLM Gateway (for agent reasoning), Knowledge System (for memory), Tool Framework (for external actions), Observation Logger (for all events)
- Used by: OpenClaw (dispatches tasks from user conversation), Task Scheduler (dispatches cron/event tasks)
2. LLM Gateway
Implemented —
kaze-gateway(port 4200)
Abstraction layer between all agents and all LLM providers.
What It Does
- Provides a unified
complete()interface — agents never hold API keys or call providers directly - Routes requests to the right provider/model/key based on tenant config
- Tracks token usage per tenant, per agent, per key
- Enforces budget limits (hard stops, not AI reasoning)
- Rate limits per provider to respect API quotas
- Falls back to alternative providers on failure
- Supports model hints (fast/balanced/best) that resolve to concrete models per tenant
Inputs / Outputs
| Input | Output |
|---|---|
| Messages + model hint + caller context (tenant, agent) | Completion response + token usage + cost + latency |
| Texts + caller context | Embedding vectors |
| Budget query (tenant, agent) | Remaining budget info |
Key Workflows
Request Routing:
Agent calls complete(messages, modelHint="balanced", context)
→ Resolve model hint to concrete model via tenant config
→ Look up tenant's key for that provider (Vault) → fall back to Speedrun key
→ Check budget → reject if exceeded
→ Check rate limit → queue if throttled
→ Send to provider → return response
→ Log usage (tokens, cost, latency) to budget tracker + Observation LoggerFallback Chain:
Provider A fails (rate limit / down / error)
→ Try Provider B (if tenant config allows)
→ Try Provider C
→ After max attempts → return error to agentConnections
- Uses: Vault (key retrieval), Observation Logger (usage logging)
- Used by: Agent Runtime (every LLM call from every agent), Knowledge System (embedding generation, quality gate evaluation)
3. Knowledge System (Mem0 + pgvector)
Partially implemented —
kaze-knowledge(port 4300)MVP scope: Mem0 per-agent episodic memory only. Shared knowledge tiers, quality gates, ABAC, and graph traversal are deferred.
Persistent memory and knowledge layer for all agents.
What It Does
- Stores 4 types of memory: episodic (events/history), semantic (facts/relationships), procedural (skills/how-to), reflective (insights/learnings)
- Retrieves relevant memories using tri-factor scoring: recency × importance × relevance
- Versions all knowledge writes with provenance (who, when, why — git-inspired)
- Enforces access control: private tier (agent-scoped) and shared tier (vertical-scoped)
- Quality-gates shared knowledge entries before they're visible to other agents
- Routes storage: Mem0 for per-agent episodic memory, pgvector for shared knowledge
Inputs / Outputs
| Input | Output |
|---|---|
| Query (text + memory types + caller context) | Ranked memory entries with scores |
| Commit (content + memory type + access tier + metadata) | Version ID + acceptance status (accepted / pending review / rejected) |
| Memory ID | Full entry with version history |
Key Workflows
Knowledge Query:
Agent queries "what do we know about Client X's SEO strategy?"
→ Check ABAC: can this agent read from this domain?
→ Search pgvector for semantic similarity (relevance)
→ Score results with tri-factor: recency + importance + relevance
→ Return top-N ranked results
→ Update last_accessed timestamps (for recency decay)Knowledge Write (Shared Tier):
Agent commits a new insight to shared vertical knowledge
→ Check ABAC: can this agent write to shared tier?
→ Run quality gate: accuracy check, contradiction detection, duplicate check
→ If passes → accept, version, make visible to vertical
→ If fails → reject with reason, agent can store privately insteadMemory Routing:
Private + episodic → Mem0 (per-agent collection)
Private + other types → pgvector (tenant-scoped)
Shared + any type → pgvector (shared tables, quality-gated)Connections
- Uses: LLM Gateway (embeddings, quality gate evaluation), Observation Logger
- Used by: Agent Runtime (agents query/commit during task execution)
4. Tool Integration Framework
Typed, auth-managed access to external services and APIs.
What It Does
- Defines tools with typed inputs/outputs, auth requirements, and retry policies
- Manages a registry of available tools, filterable by vertical
- Resolves auth credentials from Vault at execution time (agents never hold raw keys)
- Handles retries with exponential backoff for transient failures
- Rate limits tool calls to respect external API quotas
- Logs all tool executions for observability
Inputs / Outputs
| Input | Output |
|---|---|
| Tool name + input parameters + caller context | Typed result (success + data) or error (code + retryable flag) |
| Discovery query (vertical, category) | List of available tools with descriptions |
MVP Tools
| Tool | Vertical | What It Does |
|---|---|---|
| GitHub | V0 Internal Ops | List/create/update issues, PRs, comments |
| Calendar | V0 Internal Ops | List/create events, find free slots |
| SEMrush | V1 SEO | Keyword overview, keyword gap, domain analysis |
| Google Search Console | V1 SEO | Search performance, ranking data |
| Toddle DB | V2 Toddle | Query/update activities, check data freshness, get embeddings |
Key Workflow
Tool Execution:
Agent calls tool("semrush_keyword_overview", { keyword: "..." })
→ Look up tool definition in registry
→ Fetch credentials from Vault (scoped to tenant)
→ Execute with timeout
→ On failure: retry per policy (exponential backoff, max attempts)
→ Log execution to Observation Logger
→ Return typed result to agentConnections
- Uses: Vault (credential retrieval), Observation Logger
- Used by: Agent Runtime (agents invoke tools during skill execution)
5. Task Scheduler
Cron-based and event-triggered task execution for agents.
What It Does
- Registers cron schedules (from agent YAML definitions or programmatic)
- Registers event triggers (agent A completes → fire agent B)
- Dispatches tasks to Agent Runtime when triggers fire
- Ensures idempotency (no double-firing on scheduler restart)
- Tracks execution history per schedule
- Supports skip-if-running to prevent overlapping executions
Inputs / Outputs
| Input | Output |
|---|---|
| Schedule definition (cron expression + target agent + skill + input) | Registered schedule ID |
| Event trigger definition (event type + target agent + skill) | Registered trigger ID |
| Emitted event | Dispatched task (via Agent Runtime) |
Key Workflows
Cron Tick (every 10s):
Query schedules where next_run_at <= now
→ For each due schedule:
→ Check idempotency (not already fired for this timestamp)
→ Check skip-if-running (previous task still executing?)
→ Dispatch task to Agent Runtime
→ Update next_run_at
→ Record execution in historyEvent Trigger (MVP — direct callbacks):
Component calls scheduler.emit({ type: "task_completed", ... })
→ Match against registered event triggers
→ Check idempotency key
→ Dispatch matched tasks to Agent RuntimeHA Safety: Multiple scheduler replicas use database-level locking (FOR UPDATE SKIP LOCKED) so only one replica picks up each due schedule.
Connections
- Uses: Agent Runtime (dispatches tasks)
- Used by: Agent definitions (declare cron triggers in YAML), other components (emit events)
6. Observation Logger
Structured logging of all agent activity for debugging, auditing, and future training.
What It Does
- Records every significant event: agent lifecycle, task execution, LLM calls, tool calls, knowledge operations, supervision decisions, budget warnings
- Provides a query interface for debugging (trace a task, view agent timeline, aggregate metrics)
- Batches writes internally (fire-and-forget, never blocks agent execution)
- Integrates automatically as middleware on LLM Gateway, Tool Executor, and Knowledge Client
Inputs / Outputs
| Input | Output |
|---|---|
| Observation event (type + payload + context) | (fire-and-forget, async write) |
| Query filter (tenant, agent, task, time range) | Matching events |
| Task ID | Full execution trace across all agents |
| Metrics filter (tenant, time range, group-by) | Aggregate metrics (tasks, tokens, costs, error rates) |
Event Types
| Category | Events |
|---|---|
| Agent lifecycle | spawned, shutdown, state change |
| Task execution | started, completed, failed, timeout |
| LLM calls | start, complete, error (with provider, model, tokens, cost, latency) |
| Tool calls | start, complete, error (with tool name, duration, retry count) |
| Knowledge | query, commit (with memory type, domain, result count) |
| Supervision | review required, decision made (approved/rejected/modified) |
| Budget | warning (80% threshold), exceeded (hard stop) |
| Scheduler | cron triggered, event triggered |
Key Design Choice
Logging is fire-and-forget with internal batching — events buffer in memory and flush to storage in batches (100 events or every 1 second). This ensures logging never becomes a bottleneck for agent execution. If the database is temporarily unavailable, events buffer up to a limit, dropping oldest debug-level events first.
Connections
- Uses: PostgreSQL (event storage)
- Used by: Every other component (automatic middleware integration)
Cross-Component Integration
As Implemented (MVP)
┌─ kaze-gateway (port 4200) ──────────────────────────────┐
│ POST /llm/generate → Vercel AI SDK → Gemini/Claude │
│ POST /tools/execute → credential injection → APIs │
│ GET /tools/catalog │
│ Secrets: LLM keys, GitHub token │
│ Observability: Langfuse tracing │
└──────────────────────────▲──────────────────────────────┘
│ HTTP
┌──────────────────────────┴──────────────────────────────┐
│ kaze-runtime (port 4100) │
│ VerticalAgent → SubAgent (per-task, per-skill) │
│ Memory: search before LLM, store after LLM │
│ Zero secrets — calls gateway + knowledge via HTTP │
└──────┬───────────────────────────────────▲──────────────┘
│ HTTP │ HTTP
┌──────▼───────────────────────────────────┴──────────────┐
│ kaze-knowledge (port 4300) │
│ POST /memory/search → vector similarity │
│ POST /memory/add → Mem0 fact extraction + store │
│ Own LLM key (Gemini) for fact extraction + embeddings │
│ Storage: PostgreSQL + pgvector │
└─────────────────────────────────────────────────────────┘Full Design (Target)
┌──────────────────────────────┐
│ OpenClaw (Layer 0.5) │
│ User ↔ Agent conversation │
└──────────────┬───────────────┘
│ dispatches tasks
▼
┌───────────────────────────────────────────────────────────────┐
│ Agent Runtime (1) │
│ │
│ Spawns agents · Dispatches tasks · Manages lifecycle │
│ Enforces supervision · Routes inter-agent messages │
│ │
│ Uses: LLM Gateway (2), Knowledge (3), Tools (4), Logger (6) │
└───────┬──────────────┬────────────────┬───────────────────────┘
│ │ │
┌────▼────┐ ┌─────▼──────┐ ┌────▼──────────┐
│ LLM │ │ Knowledge │ │ Tool │
│ Gateway │ │ System │ │ Framework │
│ (2) │ │ (3) │ │ (4) │
│ │ │ │ │ │
│ Multi- │ │ Mem0 + │ │ GitHub, │
│ provider│ │ pgvector │ │ SEMrush, │
│ routing │ │ Tri-factor │ │ Calendar, │
│ Budget │ │ ABAC │ │ Toddle DB │
└─────────┘ └────────────┘ └───────────────┘
│ │ │
└──────────────┼────────────────┘
│ all events logged
┌──────▼──────┐ ┌───────────────┐
│ Observation │ │ Task │
│ Logger (6) │ │ Scheduler (5) │
│ │ │ │
│ All events │ │ Cron + Event │
│ Batched │ │ → dispatch() │
│ writes │ │ to Runtime │
└─────────────┘ └───────────────┘Error Handling Summary
| Component | Key Failure Mode | Recovery |
|---|---|---|
| Agent Runtime | Agent crash during task | Mark task failed, log error, transition agent to error state. After cooldown, return to ready. 3 consecutive errors → alert ops. |
| Agent Runtime | Task timeout | Abort via signal, mark timeout, agent returns to ready if healthy. |
| LLM Gateway | Provider rate limited / down | Fallback to next provider in chain. Queue with backoff. Return error after max attempts. |
| LLM Gateway | Budget exceeded | Hard reject. No fallback, no retry. Deterministic code check. |
| Knowledge System | Quality gate rejects shared write | Return rejection reason. Agent can store privately instead. |
| Knowledge System | Mem0 unavailable | Graceful degradation: buffer episodic writes to Postgres directly. Reads return empty for Mem0-backed memories. |
| Tool Framework | External API error (retryable) | Exponential backoff per retry policy. After max attempts, return error to agent. |
| Tool Framework | Auth failure | Re-fetch from Vault (cache may be stale). Retry once. If still failing, return auth error. |
| Task Scheduler | Missed cron tick (restart) | On startup, scan for overdue schedules. Execute missed runs (up to 1hr lookback). Mark older as skipped. |
| Observation Logger | Database write failure | Buffer in memory (up to limit). Drop oldest debug events first. Never block agent execution. |
Security Controls Per Component
Each component enforces security boundaries independently — no single component failure should compromise the system.
| Component | Security Control | What It Enforces |
|---|---|---|
| Agent Runtime | Capability manifest enforcement | Agent can only invoke tools, knowledge domains, and channels declared in its manifest |
| Agent Runtime | Per-agent resource quotas | Max concurrent tasks, max tool calls per task, max subagent depth |
| Agent Runtime | Supervision state isolation | Agents cannot read or modify their own supervision statistics |
| LLM Gateway | Data classification check | Tags on knowledge entries ("safe for LLM" vs "internal only") respected before inclusion in prompts |
| LLM Gateway | Secret scanning in prompts | Detect and redact credentials/keys before sending to provider |
| LLM Gateway | Per-tenant request fairness | No single tenant can saturate the provider queue |
| Knowledge System | Shared knowledge quarantine | Writes to shared tier enter quarantine before becoming visible (configurable: time-based or review-based) |
| Knowledge System | Write rate limiting | Flag agents that write excessively to shared tier |
| Knowledge System | Provenance chain | Every shared entry traces to originating observation, not just agent identity |
| Tool Framework | Egress whitelist | Tool calls validated against per-tenant, per-vertical allowed endpoints |
| Tool Framework | Output scanning | Detect client-specific data in tool call parameters before sending externally |
| Observation Logger | Secret redaction | Scan all event payloads for credential patterns before storage |
| Observation Logger | Append-only audit | Observation events are immutable — no update or delete |
| Task Scheduler | Idempotency enforcement | Prevents duplicate task dispatch from replay or restart |
Full threat model and attack surface analysis in research/threat-model.md.
Phase 2 Migration Notes
- NATS: The inter-agent message envelope is designed so swapping DirectCallTransport for NatsTransport changes only the transport layer. Agent code, task definitions, and message shapes stay identical.
- Apache AGE: The knowledge query/commit interface gains a graph traversal retrieval strategy alongside vector search. Same interface, new strategy option.
- Layer 3 Agents: Supervisor and Quality Monitor agents consume the Observation Logger's event stream. Improvement agents write new versions of agent/skill definitions.