Project Kaze — Philosophy & Vision
Context
Speedrun Ventures is an AI-native venture studio building AI agents and automated workflows for SMEs to optimize their P&L. The studio operates a portfolio of products — each generating domain-specific data and knowledge — with a shared agent platform (Kaze) that compounds learnings across all of them.
What is Kaze?
Kaze is an operating system for AI agents. It provides the runtime, knowledge, tooling, and governance layers that enable Speedrun to:
- Define agents from composable, reusable skills
- Execute agents across multiple business verticals
- Accumulate and share knowledge across agents and verticals
- Earn trust through a supervision ramp (supervised → sampling → autonomous)
- Scale agent fleets without per-agent operational burden
Kaze is not a chatbot platform or an LLM wrapper. It is a system where AI agents are the primary units of computation — reasoning about tasks, using tools, learning from outcomes, and operating under governance.
Core Principles
| Principle | What it means in practice |
|---|---|
| AI-Native | AI agents are the primary compute units, not add-ons to traditional software. The system is designed around agent reasoning, memory, and tool use as first-class concepts. |
| Vertical-First | Value comes from going deep into specific business domains (internal ops, SEO, content), not from building a generic horizontal platform. Each vertical creates compounding domain knowledge. |
| Security & Privacy First | Every design decision prioritizes data isolation, tenant security, and credential management. Zero-secret runtime — agents never hold API keys. |
| Meet Humans Where They Are | Users interact via their existing tools (Slack, WhatsApp, Telegram) through OpenClaw. No specialized dashboard required for day-to-day use. |
| Cloud-Agnostic | All infrastructure is containerized and defined as IaC. The same artifacts deploy on any cloud or on-premises with only configuration differences. |
| Portable by Default | One build, different configurations. No SaaS-only or on-prem-only code paths. |
The AI-Native Paradigm
Most "AI platforms" place humans as operators with AI as a tool:
Human defines → Human deploys → Agent executes → Human monitors → Human improvesKaze's target state inverts this. AI is the operating layer. Humans are governors:
Human sets goals & guardrails
│
▼
AI orchestrates → AI executes → AI evaluates
↑ │
└──── AI improves & adapts ────┘
Human intervenes only at:
- Goal setting
- Guardrail violations
- Escalation thresholds
- Approval gates (when configured)Current state: The platform today operates at the first pattern — humans define skills, deploy agents, monitor via Langfuse, and manually improve prompts. The AI-native self-improvement loop (Layer 3: Governance) is a design-phase target, not yet implemented. The architecture is built to support this evolution.
AI Monitors AI (Target State)
The design calls for a Supervisor Agent Layer where AI agents monitor other agents:
- Health Monitor Agent — Watches fleet health, detects failures, restarts stuck agents
- Cost Monitor Agent — Tracks token spend, detects budget anomalies, throttles proactively
- Quality Monitor Agent — Evaluates outputs, catches hallucinations, scores task completion
These would reason about novel failure modes rather than following static rules. Hard circuit breakers remain deterministic code — budget limits, error rate thresholds, and permission boundaries are enforced by platform code, not AI reasoning.
AI Improves AI (Target State)
Every agent execution produces signals that can feed a continuous improvement cycle:
| Layer | What improves | How |
|---|---|---|
| Prompts | System prompts, few-shot examples | A/B testing, measuring output quality, auto-selecting winners |
| Tool usage | Which tools, in what order | Analyzing successful vs. failed runs, optimizing patterns |
| Model selection | Which LLM for which task | Cost vs. quality tracking per model per task, auto-routing |
| Knowledge | What context an agent receives | Learning which knowledge is useful, pruning noise |
Safeguard: All self-improvements would be versioned, canaried, and reversible. An agent never modifies itself for all traffic simultaneously.
The Vertical-First Flywheel
Kaze's moat is domain knowledge accumulated through vertical depth:
┌─────────────────┐
┌───▶│ More Agents │───┐
│ │ (new verticals) │ │
│ └─────────────────┘ │
│ ▼
┌──────────┴──────┐ ┌──────────────────┐
│ Better Knowledge │◀────────│ More Executions │
│ (shared patterns)│ │ (tasks completed) │
└─────────────────┘ └──────────────────┘Each vertical produces domain knowledge. Cross-vertical patterns (communication, data analysis, reporting) compound. A new vertical starts with shared knowledge from day one.
Current Verticals
| Vertical | Status | Agent Focus |
|---|---|---|
| V0: Internal Ops | Active — 6 skills | Research, GitHub operations, digests, triage, docs-sync, meeting notes |
| V1: SEO Automation | Planned | Keyword research, content optimization, technical audits, reporting |
| V2: Toddle Activity Enrichment | Planned | Content enrichment, data quality, recommendation tuning |
V0 is strategic. It's Speedrun's own internal operations — fast feedback, no external coordination, every platform component gets exercised before external verticals use it.
The Supervision Ramp
Trust is earned, not assumed. Every agent-skill pair progresses through three levels:
supervised ──────▶ sampling ──────▶ autonomous
│ │ │
│ Every output │ Random X% │ All outputs
│ reviewed by │ reviewed. │ delivered
│ human before │ Rest delivered │ directly.
│ delivery. │ immediately. │ Async quality
│ │ │ monitoring.
│ │ │
└── demotion ◀─────┴── demotion ◀────┘
if quality drops if quality dropsKey design decisions:
- Per-skill, not per-agent. An agent can be autonomous for one skill and supervised for another.
- Read-only to agents. Only the platform's supervision ramp logic can promote or demote. Agents cannot query or influence their own supervision state.
- Automatic promotion at thresholds: 20 successful runs → sampling, 50 → autonomous. 3 consecutive failures → demotion.
- Hard boundaries. Safety-critical operations remain supervised regardless of performance history.
Knowledge Architecture
The knowledge system provides persistent memory for all agents. The design has three tiers, with MVP implementing the first:
Implemented: Per-Agent Episodic Memory
Each agent has its own memory space via Mem0. Before reasoning, agents search for relevant context. After completing tasks, agents store key learnings. Facts are extracted by LLM and embedded with Google's gemini-embedding-001 (768-dim vectors).
Designed (Not Yet Built): Shared Knowledge
┌──────────────────────────────────────────────────┐
│ Knowledge Architecture │
│ │
│ Vertical Knowledge (shared within a vertical) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ SEO │ │ Ops │ │ Toddle │ ... │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ └────────────┼────────────┘ │
│ ▼ │
│ Cross-Vertical Knowledge │
│ ┌─────────────────────────────────────┐ │
│ │ - Business operations patterns │ │
│ │ - Communication best practices │ │
│ │ - Data analysis methods │ │
│ └─────────────────────────────────────┘ │
│ │
│ Client-Specific Knowledge (isolated) │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Client A │ │ Client B │ │ Client C │ │
│ └───────────┘ └───────────┘ └───────────┘ │
└──────────────────────────────────────────────────┘Isolation rules: Vertical knowledge is shared within a vertical. Cross-vertical patterns apply everywhere. Client-specific knowledge never leaves the client boundary.
Knowledge provenance: Every entry is tagged with a source class (public, speedrun_internal, client_private, etc.) that determines visibility.
Agent Safety Boundaries
AI autonomy requires hard boundaries enforced by deterministic platform code, not agent self-discipline.
Capability Manifests
Every agent declares exactly what it can do — whitelisted tools, knowledge domains, and communication channels. The runtime enforces this on every action.
Instruction Hierarchy
When processing context, a strict trust ordering applies:
System prompt (platform-defined, immutable)
> Skill definition (admin-authored, versioned)
> Retrieved knowledge (quality-gated, provenance-tracked)
> User input (untrusted by default)
> Tool output (untrusted, external)Higher levels override lower levels. This is the primary defense against prompt injection.
Subagent Privilege Inheritance
When an agent spawns a subagent:
- The subagent inherits at most the parent's capabilities
- The subagent inherits at most the parent's supervision level
- Parent's resource quotas are shared, not duplicated
Relationship with OpenClaw
OpenClaw (the Claude Code fork) serves as the communication and conversation layer. Kaze builds its own agent design, memory system, and knowledge architecture on top:
┌──────────────────────────────────────────────────┐
│ User Interface (WhatsApp / Telegram / Slack) │
└──────────────┬───────────────────────────────────┘
│
┌──────────────▼───────────────────────────────────┐
│ OpenClaw Layer │
│ • Conversation management │
│ • Simple orchestration & tool routing │
│ • Subagent spawning │
│ • Multi-channel support (built-in) │
└──────────────┬───────────────────────────────────┘
│ kaze_dispatch_task / kaze_list_verticals
┌──────────────▼───────────────────────────────────┐
│ Kaze Platform Layer │
│ • Agent runtime (YAML + TypeScript hybrid) │
│ • Memory system (Mem0 + pgvector) │
│ • LLM Gateway (multi-provider) │
│ • Skill framework (composable, vertical-specific) │
│ • Supervision ramp & quality monitoring │
└──────────────────────────────────────────────────┘Why this split: OpenClaw provides a mature conversation layer and multi-channel support. Kaze's differentiation is in agent architecture, memory, knowledge accumulation, and the self-improvement loop — not chat interfaces.
Document Map
| Document | Description |
|---|---|
| System Architecture | Component architecture — runtime, gateway, knowledge, agents, API contracts, data flows |
| Infrastructure | Kubernetes topology, CI/CD, GitOps, secrets management, sidecars, networking |
| Non-Functional Assessment | Security posture, threat model, cost model, scalability analysis |