Kaze — Product Overview

Comprehensive product brief for Project Kaze by Speedrun Ventures.

1. Product Overview

Kaze is an operating system for AI agents built by Speedrun Ventures. It enables defining, orchestrating, and operating fleets of AI agents that automate business operations for SME clients.

Kaze is not another LLM wrapper or chatbot platform. It is an AI-native system where AI is the core operating layer — AI monitors AI, AI improves AI, and human involvement is minimized to governance and exception handling.

Core thesis: SMEs need outcomes, not tools. "Your SEO is handled" is fundamentally different from "here's an API." Kaze delivers automated business operations, not developer infrastructure.

Business model: Tiered SaaS subscription. Clients get fleets of domain-expert AI agents that handle their business operations across SEO, content, data quality, project management, and more. Agents learn from every interaction, building compounding vertical knowledge.

2. What We're Solving

The Problem

SMEs are caught between two bad options:

Hire specialists — expensive ($5-15k/month per role), hard to find, slow to onboard, limited by human throughput
Use generic AI tools — ChatGPT, Copilot, etc. are powerful but require expertise to use, produce inconsistent results, don't learn, and aren't integrated into business workflows

No existing solution gives SMEs access to persistent, domain-expert AI agents that operate autonomously within their business context, learn over time, and deliver consistent results across channels (Slack, Email, WhatsApp).

Why Now

Frontier models have reached the quality bar. Claude, Gemini, and GPT-4 can now reliably perform complex reasoning, tool use, and multi-step workflows — the foundation for autonomous agents.
Cost has collapsed. A full agent task costs $0.02–$0.17 fully loaded. An agent handling 10 tasks/day costs $10–30/month in LLM costs. This is 100-500x cheaper than a human specialist.
Tool ecosystems are maturing. MCP, function calling, and API standards make it tractable to build agents that interact with real business systems.
Trust mechanisms are emerging. Per-skill supervision ramps, quality monitoring, and deterministic safety boundaries make it possible to gradually hand autonomy to agents — not all-or-nothing.

What We Uniquely Solve

Existing solutions	Gap Kaze fills
ChatGPT / Claude (horizontal AI)	No persistence, no domain expertise, no business integration, no autonomy management
Zapier / Make (workflow automation)	No intelligence, no learning, brittle rules, can't handle ambiguity
Custom AI development	Too expensive for SMEs ($50-200k+), long timelines, no reusable platform
AI agent frameworks (LangChain, CrewAI)	Developer tools, not business solutions. No vertical expertise, no multi-tenant, no compliance

3. Product Architecture

Design Philosophy

Agent-Oriented Architecture — a hybrid of Actor Model (autonomous entities with private state), Event-Driven Architecture (loose coupling via async events), Microservices (independent deployment), and Cell-Based Architecture (isolated deployments per tenant/VPC).

What's new to Kaze (no traditional equivalent):

Components that learn and self-modify their behavior over time
A governance hierarchy where AI agents supervise other AI agents
Shared knowledge across agents while maintaining runtime isolation
A supervision ramp (supervised → sampling → autonomous) as a trust model

5-Layer Architecture

Layer 3: GOVERNANCE & SELF-IMPROVEMENT
  Supervisor Agents · Quality Monitor · Improvement Agent

Layer 2: ORCHESTRATION & KNOWLEDGE
  Orchestrator Agents · Shared Knowledge Graph (per-vertical + cross-vertical + per-client)

Layer 1: EXECUTION
  Agent Skills (composable, reusable per vertical): Keyword Research, Content Optimize, etc.

Layer 0.5: INTERACTION
  Conversation Manager → Slack · Email · WhatsApp · Telegram

Layer 0: PLATFORM INFRASTRUCTURE
  LLM Gateway · PostgreSQL + pgvector · Mem0 · Vault · Observability

Three-Service Topology (Implemented)

┌─ kaze-gateway (port 4200) ──────────────────────────────┐
│  LLM calls (Vercel AI SDK → Gemini/Claude)               │
│  Tool execution (credential injection → external APIs)   │
│  Holds: all LLM keys, tool API tokens                    │
│  Observability: Langfuse tracing                         │
└──────────────────────────▲──────────────────────────────┘
                           │ HTTP
┌──────────────────────────┴──────────────────────────────┐
│  kaze-runtime (port 4100)                               │
│  VerticalAgent → SubAgent (per-task, per-skill)         │
│  Memory: search before LLM call, store after LLM call   │
│  Zero secrets — pure orchestration                      │
└──────┬───────────────────────────────────▲──────────────┘
       │ HTTP                              │ HTTP
┌──────▼───────────────────────────────────┴──────────────┐
│  kaze-knowledge (port 4300)                             │
│  Per-agent episodic memory (Mem0 + pgvector)            │
│  LLM fact extraction + vector similarity search         │
│  Own LLM key — independent of gateway                   │
└─────────────────────────────────────────────────────────┘

Secret isolation principle: Gateway holds LLM + tool keys, Knowledge holds its own LLM key for fact extraction, Runtime holds zero secrets. No single service compromise exposes all credentials.

4. Core Product Capabilities

4.1 Agent Skills — The Composable Unit

Skills are the atomic reusable unit of agent capability. YAML-defined with optional TypeScript handlers.

yaml

skill: keyword-research
  inputs: [business_context, current_rankings, competitors]
  tools_required: [semrush_api, google_search_console, llm]
  outputs: [keyword_opportunities, priority_ranking, reasoning]
  knowledge_dependencies: [seo/domain-concepts, seo/best-practices]
  quality_criteria: [relevance_score > 0.8, search_volume validation]

An agent is a composition of skills + role + context. Skills transfer across verticals where applicable.

4.2 Supervision Ramp

The transition from human control to agent autonomy happens in three phases, configured per skill × client × risk level:

Phase	What happens	Signal
Supervised	Agent works, human reviews every output. Corrections feed back into learning.	Building training data
Sampling	Random 10-20% gets human review. Quality score maintained. Auto-rollback if quality drops.	Statistical confidence
Autonomous	AI quality check on all outputs. Auto-delivers unless confidence below threshold. Escalates only exceptions.	Self-correcting

Example: An SEO agent might simultaneously be autonomous at keyword research (measurable), sampling on content optimization (subjective), and supervised on client communication (high-stakes).

4.3 Multi-Channel Interaction

Agents meet humans where they are — Slack, Email, WhatsApp, Telegram. No dashboards for end-users.

Slack #seo-updates: Agent: "Found 12 new keyword opportunities. Top 3: [X, Y, Z] — ~5k/mo combined traffic. Drafted content briefs. Proceed?" Human: "Skip Z, we dropped that product." Agent: "Got it — I'll remember that. Proceeding with X and Y."

That correction feeds back into the client knowledge graph automatically.

4.4 Knowledge System

Per-agent episodic memory via Mem0 (implemented), with shared vertical knowledge via pgvector (planned):

Knowledge tier	What it stores	Isolation
Per-agent (private)	Conversation history, task outcomes, client preferences	Agent-scoped
Vertical (shared)	Domain expertise, best practices, tool knowledge	Shared across clients in a vertical
Cross-vertical	Business operations patterns, communication practices	Platform-wide
Client-specific	Brand voice, industry quirks, preferences, history	Never leaves client boundary

4.5 LLM Gateway

Multi-provider abstraction (Gemini, Claude, OpenAI, local models). Agents never hold API keys.

Model hints — agents request quality level (fast/balanced/best), gateway resolves to concrete model
Dual-key BYOK — clients can bring their own LLM keys, dropping Speedrun's variable cost to near-zero
Tool execution — credential injection at runtime, agents never see raw tokens
Langfuse observability — every LLM call traced, cost tracked, latency measured

5. Verticals & Portfolio

The Kaze Flywheel

Pick a vertical → Encode expertise into skills → Deploy with supervision
→ Quality loop (supervised → sampling → autonomous) → Agents build knowledge graph
→ Apply to new clients (knowledge transfers, agents get smarter) → Repeat

Each vertical makes the platform smarter, not just individual agents. The moat is accumulated vertical knowledge graphs and proven agent skills.

Active & Planned Verticals

Vertical	Status	Agents	Portfolio Project
V0: Internal Ops	Active (testbed)	Research, PM, Issue Tracking, Scheduling, Docs	Speedrun's own operations
V1: SEO Automation	Planned	Keyword Research, Content Optimization, Technical Audit, Reporting	SEO clients
V2: Toddle Enrichment	Planned	Content Enrichment, Data Quality, Recommendation Tuning	toddle.sg
Punkga	Future	Content moderation, artist support, community	punkga.me
TrueSight	Future	TBD	truesight.trade

Why V0 Internal Ops First

Dogfooding — Speedrun is the first client. Every pain point we feel, our clients will feel.
Fast feedback — No external client coordination. Iterate in hours, not weeks.
Foundation testbed — Every platform component gets exercised before external verticals use it.

6. Self-Improvement / Monitoring Loop

AI Monitors AI

The first responder for system health is not a human looking at dashboards — it's an AI agent.

Health Monitor Agent — Watches fleet health, detects failures, restarts stuck agents, takes corrective action
Cost Monitor Agent — Tracks token spend, detects anomalies, throttles agents proactively
Quality Monitor Agent — Evaluates outputs for quality, catches hallucinations, scores task completion

Hard circuit breakers remain deterministic code (budget limits, error rates, permissions). AI supervision augments, never replaces safety-critical rules.

AI Improves AI

Every execution produces signals feeding a continuous improvement cycle:

Layer	What improves	How
Prompts	System prompts, few-shot examples	A/B testing, quality measurement, auto-selecting winners
Tool usage	Tool selection, call order	Analyzing successful vs failed runs
Orchestration	Workflow structure, parallelism	Bottleneck identification, step reordering
Model selection	Which LLM for which task	Cost vs quality tracking, auto-routing to cheapest adequate model
Knowledge	What context agents receive	Learning which knowledge is useful, pruning noise

All self-improvements are versioned, canaried (10% traffic), and reversible. No agent modifies itself for all traffic simultaneously.

7. Unit Economics & Cost Structure

Cost Breakdown

VARIABLE (60-80%): LLM Tokens (dominant) → External APIs → Embeddings
SEMI-FIXED:        Compute / K8s → Database → Storage
FIXED:             Control plane → CI/CD → Monitoring

Key insight: LLM token cost dominates everything. A 20% reduction in tokens per task saves more money than halving infrastructure costs.

Cost per Task (Fully Loaded)

Task Type	LLM	Compute	Tools	Total
Simple extraction	$0.011	$0.002	$0	~$0.013
Keyword research	$0.042	$0.005	$0.05	~$0.10
Content optimization	$0.063	$0.005	$0.02	~$0.09
Research synthesis	$0.084	$0.008	$0	~$0.09
Technical audit	$0.105	$0.010	$0.05	~$0.17

Most tasks cost $0.02–$0.17 fully loaded. 100-500x cheaper than human specialists.

Cost per Tenant per Month

Type	Agents	LLM	Infra	Tools	Total	Suggested Price
Small (BYOK)	3	$0	$20-35	$0-20	$50	$200/mo
Medium (BYOK)	5	$0	$25-40	$20-50	$90	$500/mo
Medium (Speedrun keys)	5	$50-120	$25-40	$20-50	$210	$900/mo
Large (dedicated)	8+	$100-250	$270-330	$50-100	$680	Custom

Gross Margins

Tier	Revenue	Cost	Gross Margin
Small @ $200/mo (BYOK)	$200	$50	75%
Medium @ $500/mo (BYOK)	$500	$90	82%
Medium @ $900/mo (Speedrun keys)	$900	$210	77%
Large @ $1,500/mo	$1,500	$680	55-72%

Target: 65-80% gross margins. Achievable with BYOK as default.

Cost Optimization Levers

Model selection optimization — route tasks to cheapest adequate model (40-50% LLM savings)
Prompt caching — 90% discount on repeated context (Anthropic)
Client BYOK — clients bring own LLM keys, Speedrun's variable cost drops 60-80%
Batch API — 50% discount for non-urgent tasks (quality evaluation, knowledge consolidation)

Scale Economics

Stage	Tenants	Total Cost/mo	Per Tenant
Stage 0 (MVP)	1 (internal)	$1,350	—
Stage 1	10	$2,100-2,600	$210-260
Stage 2	50	$7,100-12,100	$142-242
Stage 3	200	$31,700-56,700	$159-284

Cost scales with usage (variable-heavy), not ahead of it. No cliff edges.

8. Non-Functional Assessment

Security

Tenant isolation — cell architecture, namespace separation, VPC deployment option
Secret isolation — services hold only the keys they need; runtime holds zero secrets
Data classification — knowledge entries tagged "safe for LLM" vs "internal only"
Instruction hierarchy — system > skill > knowledge > user > tool output (prompt injection defense)
Capability manifests — agents can only invoke declared tools, knowledge domains, and communication targets
Provenance chain — every knowledge entry traces to source with consent classification
Deterministic safety — budget limits, error thresholds, permissions enforced by code, not AI reasoning

Scalability

Metric-triggered scaling, not pre-optimization:

pgvector p95 >200ms → add Qdrant
Direct calls hitting throughput limit → add NATS
Agent code never changes when infrastructure scales

Deployment

Cloud-agnostic — containerized, IaC, deployable on any cloud or on-premises
Two modes: Agency (multi-tenant SaaS) and Customer VPC (single-tenant, data stays in client boundary)
Current MVP: GitHub Actions → Tailscale SSH → Docker on EC2

9. Risks & Mitigations

Risk	Likelihood	Impact	Mitigation
Complexity overwhelming small team	High	High	Modular monolith start, extract services only when needed
Non-determinism makes debugging hard	High	Medium	Immutable versioning of everything, full execution traces
Evaluation accuracy insufficient for autonomy	Medium	High	Start with measurable tasks, multi-signal evaluation
Supervisor agents unreliable (AI supervising AI)	Medium	High	Deterministic circuit breakers, governance layer last to autonomy
Prompt injection manipulates agents	High	High	Instruction hierarchy + output scanning (no complete solution exists)
Client data cross-pollination (legal)	High	Critical	Default isolation, tiered consent model, provenance classification
Frontier labs ship managed agent OS	Very High	Medium	Build thin on commodity, deep on boundaries (see Competitive Positioning)

10. Competitive Positioning

Why Frontier Labs Won't Build What Kaze Builds

Frontier labs (Anthropic, OpenAI, Google) will commoditize generic agent infrastructure (execution loops, scheduling, basic memory). They will NOT build:

Kaze Capability	Why Labs Won't
Multi-provider LLM Gateway	Anthropic won't route to OpenAI, and vice versa
Cell isolation + VPC deployment	Their model is centralized SaaS
Data classification + provenance	They want data flowing through their models, not gated
BYOK across providers	They want clients on their keys
Domain-calibrated supervision ramp	They build horizontal, not vertical
Vertical knowledge flywheel	They sell tools, not outcomes

Kaze gets more valuable as frontier labs get more powerful. More powerful AI on client data creates MORE need for sovereignty, provider independence, budget controls, audit trails, and graduated trust.

Historical Parallels

Generic platform	Boundary/compliance layer that thrived
AWS/GCP/Azure	Snowflake, Databricks (data governance + multi-cloud)
Public cloud	HashiCorp (multi-cloud abstraction + security)
LLM APIs	AI gateways (Portkey, Helicone — routing, compliance)
Stripe (payments)	Plaid (financial data boundaries)
Salesforce (CRM)	Veeva (vertical CRM with pharma compliance)

Pattern: Generic platforms commoditize execution. Boundary-enforcement and vertical-expertise layers capture value on top.

Build Thin vs Build Deep

Build thin (use commodity)	Build deep (this is the moat)
Agent execution loop	Multi-provider gateway with BYOK + budget
Basic scheduling	Data classification and compliance boundaries
Generic tool wrappers	Cell architecture with VPC deployment
Conversation persistence	Supervision ramp calibrated per domain
Single-agent memory	Cross-agent knowledge with provenance + ABAC
	Vertical skills and domain expertise

11. Current Status & Roadmap

What's Built (Core Platform)

Component	Repo	Status
Agent Runtime	`kaze-runtime`	Implemented — two-layer agent model, YAML+TS skills, HTTP dispatch, supervision ramp
LLM Gateway	`kaze-gateway`	Implemented — Vercel AI SDK, multi-provider (Gemini/Claude), Langfuse observability
Knowledge Service	`kaze-knowledge`	Implemented — Mem0 + pgvector, fact extraction, per-agent episodic memory
Internal Ops (V0)	`kaze-agent-ops`	In progress — GitHub skill operational
CI/CD	All repos	GitHub Actions → Tailscale SSH → Docker on EC2

What's Next

Additional V0 skills — Calendar, Research, Project Management, Documentation
V1 SEO vertical — Keyword Research, Content Optimization, Technical Audit, Reporting
V2 Toddle vertical — Content Enrichment, Data Quality, Recommendation Tuning
Task Scheduler — Cron + event triggers for automated workflows
Shared knowledge tier — Quality gates, ABAC, cross-agent knowledge
Self-improvement loop — Quality monitoring, prompt optimization, canary deployment

Parallel Team Structure

Lead:    Foundation Platform + V0 Internal Ops (dogfooding)
Team 2:  V2 Toddle (content enrichment, data quality)
Team 3:  V1 SEO (keyword research, content optimization)

12. Key Design Decisions

52 design decisions documented (D1-D52). Key decisions:

#	Decision	Choice
D6	LLM provider strategy	Multi-provider, abstracted behind LLM Gateway
D7	Key management	Dual-key (Speedrun keys + client BYOK)
D11	Architecture pattern	Agent-Oriented (Actor + EDA + Cell hybrid)
D14	Supervision model	Per-skill ramp: supervised → sampling → autonomous
D18	Knowledge storage	PostgreSQL + pgvector (+ Apache AGE later)
D19	Per-agent memory	Mem0
D30	MVP knowledge	Mem0 + pgvector only (defer graph DB)
D43	Data rights	Tiered consent model with provenance classification
D44	Scaling strategy	Metric-triggered, not pre-optimized
D46	Service topology	Three repos: gateway, runtime, knowledge (secret isolation)
D47	LLM SDK	Vercel AI SDK
D51	Vector store	PostgreSQL + pgvector via LangChain adapter

Full log: decisions.md

13. Open Questions

#	Question	Impact
Q5	Cross-channel context management (unified thread model)	Medium — UX
Q6	Supervision queue UX (how ops reviews agent outputs)	Medium — operational efficiency
Q8	Billing model (per-agent, per-task, subscription?)	Medium — business model
Q9	Cross-cell agent communication	Low (Phase 3)
Q10	Canary deployment for agent improvements	Medium — safety

Source Documents

All source material lives in the kaze repo:

architecture/overview.md — Vision, principles, system architecture
strategy/product-strategy.md — Verticals, supervision ramp, multi-channel
architecture/ai-native.md — Self-improvement loop, knowledge graph, agent safety
architecture/infrastructure.md — Deployment modes, cells, cloud strategy
architecture/technical-design.md — 6 MVP component designs
strategy/tradeoffs.md — 9 risks with mitigations
strategy/decisions.md — D1-D52 design decisions
strategy/mvp.md — MVP scope, build plan
research/cost-model.md — Full unit economics
research/frontier-lab-competitive-analysis.md — Competitive positioning

Kaze — Product Overview ​

1. Product Overview ​

2. What We're Solving ​

The Problem ​

Why Now ​

What We Uniquely Solve ​

3. Product Architecture ​

Design Philosophy ​

5-Layer Architecture ​

Three-Service Topology (Implemented) ​

4. Core Product Capabilities ​

4.1 Agent Skills — The Composable Unit ​

4.2 Supervision Ramp ​

4.3 Multi-Channel Interaction ​

4.4 Knowledge System ​

4.5 LLM Gateway ​

5. Verticals & Portfolio ​

The Kaze Flywheel ​

Active & Planned Verticals ​

Why V0 Internal Ops First ​

6. Self-Improvement / Monitoring Loop ​

AI Monitors AI ​

AI Improves AI ​

7. Unit Economics & Cost Structure ​

Cost Breakdown ​

Cost per Task (Fully Loaded) ​

Cost per Tenant per Month ​

Gross Margins ​

Cost Optimization Levers ​

Scale Economics ​

8. Non-Functional Assessment ​

Security ​

Scalability ​

Deployment ​

9. Risks & Mitigations ​

10. Competitive Positioning ​

Why Frontier Labs Won't Build What Kaze Builds ​

Historical Parallels ​

Build Thin vs Build Deep ​

11. Current Status & Roadmap ​

What's Built (Core Platform) ​

What's Next ​

Parallel Team Structure ​

12. Key Design Decisions ​

13. Open Questions ​

Source Documents ​