Kaze — Product Overview
Comprehensive product brief for Project Kaze by Speedrun Ventures.
1. Product Overview
Kaze is an operating system for AI agents built by Speedrun Ventures. It enables defining, orchestrating, and operating fleets of AI agents that automate business operations for SME clients.
Kaze is not another LLM wrapper or chatbot platform. It is an AI-native system where AI is the core operating layer — AI monitors AI, AI improves AI, and human involvement is minimized to governance and exception handling.
Core thesis: SMEs need outcomes, not tools. "Your SEO is handled" is fundamentally different from "here's an API." Kaze delivers automated business operations, not developer infrastructure.
Business model: Tiered SaaS subscription. Clients get fleets of domain-expert AI agents that handle their business operations across SEO, content, data quality, project management, and more. Agents learn from every interaction, building compounding vertical knowledge.
2. What We're Solving
The Problem
SMEs are caught between two bad options:
- Hire specialists — expensive ($5-15k/month per role), hard to find, slow to onboard, limited by human throughput
- Use generic AI tools — ChatGPT, Copilot, etc. are powerful but require expertise to use, produce inconsistent results, don't learn, and aren't integrated into business workflows
No existing solution gives SMEs access to persistent, domain-expert AI agents that operate autonomously within their business context, learn over time, and deliver consistent results across channels (Slack, Email, WhatsApp).
Why Now
- Frontier models have reached the quality bar. Claude, Gemini, and GPT-4 can now reliably perform complex reasoning, tool use, and multi-step workflows — the foundation for autonomous agents.
- Cost has collapsed. A full agent task costs $0.02–$0.17 fully loaded. An agent handling 10 tasks/day costs $10–30/month in LLM costs. This is 100-500x cheaper than a human specialist.
- Tool ecosystems are maturing. MCP, function calling, and API standards make it tractable to build agents that interact with real business systems.
- Trust mechanisms are emerging. Per-skill supervision ramps, quality monitoring, and deterministic safety boundaries make it possible to gradually hand autonomy to agents — not all-or-nothing.
What We Uniquely Solve
| Existing solutions | Gap Kaze fills |
|---|---|
| ChatGPT / Claude (horizontal AI) | No persistence, no domain expertise, no business integration, no autonomy management |
| Zapier / Make (workflow automation) | No intelligence, no learning, brittle rules, can't handle ambiguity |
| Custom AI development | Too expensive for SMEs ($50-200k+), long timelines, no reusable platform |
| AI agent frameworks (LangChain, CrewAI) | Developer tools, not business solutions. No vertical expertise, no multi-tenant, no compliance |
3. Product Architecture
Design Philosophy
Agent-Oriented Architecture — a hybrid of Actor Model (autonomous entities with private state), Event-Driven Architecture (loose coupling via async events), Microservices (independent deployment), and Cell-Based Architecture (isolated deployments per tenant/VPC).
What's new to Kaze (no traditional equivalent):
- Components that learn and self-modify their behavior over time
- A governance hierarchy where AI agents supervise other AI agents
- Shared knowledge across agents while maintaining runtime isolation
- A supervision ramp (supervised → sampling → autonomous) as a trust model
5-Layer Architecture
Layer 3: GOVERNANCE & SELF-IMPROVEMENT
Supervisor Agents · Quality Monitor · Improvement Agent
Layer 2: ORCHESTRATION & KNOWLEDGE
Orchestrator Agents · Shared Knowledge Graph (per-vertical + cross-vertical + per-client)
Layer 1: EXECUTION
Agent Skills (composable, reusable per vertical): Keyword Research, Content Optimize, etc.
Layer 0.5: INTERACTION
Conversation Manager → Slack · Email · WhatsApp · Telegram
Layer 0: PLATFORM INFRASTRUCTURE
LLM Gateway · PostgreSQL + pgvector · Mem0 · Vault · ObservabilityThree-Service Topology (Implemented)
┌─ kaze-gateway (port 4200) ──────────────────────────────┐
│ LLM calls (Vercel AI SDK → Gemini/Claude) │
│ Tool execution (credential injection → external APIs) │
│ Holds: all LLM keys, tool API tokens │
│ Observability: Langfuse tracing │
└──────────────────────────▲──────────────────────────────┘
│ HTTP
┌──────────────────────────┴──────────────────────────────┐
│ kaze-runtime (port 4100) │
│ VerticalAgent → SubAgent (per-task, per-skill) │
│ Memory: search before LLM call, store after LLM call │
│ Zero secrets — pure orchestration │
└──────┬───────────────────────────────────▲──────────────┘
│ HTTP │ HTTP
┌──────▼───────────────────────────────────┴──────────────┐
│ kaze-knowledge (port 4300) │
│ Per-agent episodic memory (Mem0 + pgvector) │
│ LLM fact extraction + vector similarity search │
│ Own LLM key — independent of gateway │
└─────────────────────────────────────────────────────────┘Secret isolation principle: Gateway holds LLM + tool keys, Knowledge holds its own LLM key for fact extraction, Runtime holds zero secrets. No single service compromise exposes all credentials.
4. Core Product Capabilities
4.1 Agent Skills — The Composable Unit
Skills are the atomic reusable unit of agent capability. YAML-defined with optional TypeScript handlers.
skill: keyword-research
inputs: [business_context, current_rankings, competitors]
tools_required: [semrush_api, google_search_console, llm]
outputs: [keyword_opportunities, priority_ranking, reasoning]
knowledge_dependencies: [seo/domain-concepts, seo/best-practices]
quality_criteria: [relevance_score > 0.8, search_volume validation]An agent is a composition of skills + role + context. Skills transfer across verticals where applicable.
4.2 Supervision Ramp
The transition from human control to agent autonomy happens in three phases, configured per skill × client × risk level:
| Phase | What happens | Signal |
|---|---|---|
| Supervised | Agent works, human reviews every output. Corrections feed back into learning. | Building training data |
| Sampling | Random 10-20% gets human review. Quality score maintained. Auto-rollback if quality drops. | Statistical confidence |
| Autonomous | AI quality check on all outputs. Auto-delivers unless confidence below threshold. Escalates only exceptions. | Self-correcting |
Example: An SEO agent might simultaneously be autonomous at keyword research (measurable), sampling on content optimization (subjective), and supervised on client communication (high-stakes).
4.3 Multi-Channel Interaction
Agents meet humans where they are — Slack, Email, WhatsApp, Telegram. No dashboards for end-users.
Slack #seo-updates: Agent: "Found 12 new keyword opportunities. Top 3: [X, Y, Z] — ~5k/mo combined traffic. Drafted content briefs. Proceed?" Human: "Skip Z, we dropped that product." Agent: "Got it — I'll remember that. Proceeding with X and Y."
That correction feeds back into the client knowledge graph automatically.
4.4 Knowledge System
Per-agent episodic memory via Mem0 (implemented), with shared vertical knowledge via pgvector (planned):
| Knowledge tier | What it stores | Isolation |
|---|---|---|
| Per-agent (private) | Conversation history, task outcomes, client preferences | Agent-scoped |
| Vertical (shared) | Domain expertise, best practices, tool knowledge | Shared across clients in a vertical |
| Cross-vertical | Business operations patterns, communication practices | Platform-wide |
| Client-specific | Brand voice, industry quirks, preferences, history | Never leaves client boundary |
4.5 LLM Gateway
Multi-provider abstraction (Gemini, Claude, OpenAI, local models). Agents never hold API keys.
- Model hints — agents request quality level (fast/balanced/best), gateway resolves to concrete model
- Dual-key BYOK — clients can bring their own LLM keys, dropping Speedrun's variable cost to near-zero
- Tool execution — credential injection at runtime, agents never see raw tokens
- Langfuse observability — every LLM call traced, cost tracked, latency measured
5. Verticals & Portfolio
The Kaze Flywheel
Pick a vertical → Encode expertise into skills → Deploy with supervision
→ Quality loop (supervised → sampling → autonomous) → Agents build knowledge graph
→ Apply to new clients (knowledge transfers, agents get smarter) → RepeatEach vertical makes the platform smarter, not just individual agents. The moat is accumulated vertical knowledge graphs and proven agent skills.
Active & Planned Verticals
| Vertical | Status | Agents | Portfolio Project |
|---|---|---|---|
| V0: Internal Ops | Active (testbed) | Research, PM, Issue Tracking, Scheduling, Docs | Speedrun's own operations |
| V1: SEO Automation | Planned | Keyword Research, Content Optimization, Technical Audit, Reporting | SEO clients |
| V2: Toddle Enrichment | Planned | Content Enrichment, Data Quality, Recommendation Tuning | toddle.sg |
| Punkga | Future | Content moderation, artist support, community | punkga.me |
| TrueSight | Future | TBD | truesight.trade |
Why V0 Internal Ops First
- Dogfooding — Speedrun is the first client. Every pain point we feel, our clients will feel.
- Fast feedback — No external client coordination. Iterate in hours, not weeks.
- Foundation testbed — Every platform component gets exercised before external verticals use it.
6. Self-Improvement / Monitoring Loop
AI Monitors AI
The first responder for system health is not a human looking at dashboards — it's an AI agent.
- Health Monitor Agent — Watches fleet health, detects failures, restarts stuck agents, takes corrective action
- Cost Monitor Agent — Tracks token spend, detects anomalies, throttles agents proactively
- Quality Monitor Agent — Evaluates outputs for quality, catches hallucinations, scores task completion
Hard circuit breakers remain deterministic code (budget limits, error rates, permissions). AI supervision augments, never replaces safety-critical rules.
AI Improves AI
Every execution produces signals feeding a continuous improvement cycle:
| Layer | What improves | How |
|---|---|---|
| Prompts | System prompts, few-shot examples | A/B testing, quality measurement, auto-selecting winners |
| Tool usage | Tool selection, call order | Analyzing successful vs failed runs |
| Orchestration | Workflow structure, parallelism | Bottleneck identification, step reordering |
| Model selection | Which LLM for which task | Cost vs quality tracking, auto-routing to cheapest adequate model |
| Knowledge | What context agents receive | Learning which knowledge is useful, pruning noise |
All self-improvements are versioned, canaried (10% traffic), and reversible. No agent modifies itself for all traffic simultaneously.
7. Unit Economics & Cost Structure
Cost Breakdown
VARIABLE (60-80%): LLM Tokens (dominant) → External APIs → Embeddings
SEMI-FIXED: Compute / K8s → Database → Storage
FIXED: Control plane → CI/CD → MonitoringKey insight: LLM token cost dominates everything. A 20% reduction in tokens per task saves more money than halving infrastructure costs.
Cost per Task (Fully Loaded)
| Task Type | LLM | Compute | Tools | Total |
|---|---|---|---|---|
| Simple extraction | $0.011 | $0.002 | $0 | ~$0.013 |
| Keyword research | $0.042 | $0.005 | $0.05 | ~$0.10 |
| Content optimization | $0.063 | $0.005 | $0.02 | ~$0.09 |
| Research synthesis | $0.084 | $0.008 | $0 | ~$0.09 |
| Technical audit | $0.105 | $0.010 | $0.05 | ~$0.17 |
Most tasks cost $0.02–$0.17 fully loaded. 100-500x cheaper than human specialists.
Cost per Tenant per Month
| Type | Agents | LLM | Infra | Tools | Total | Suggested Price |
|---|---|---|---|---|---|---|
| Small (BYOK) | 3 | $0 | $20-35 | $0-20 | $50 | $200/mo |
| Medium (BYOK) | 5 | $0 | $25-40 | $20-50 | $90 | $500/mo |
| Medium (Speedrun keys) | 5 | $50-120 | $25-40 | $20-50 | $210 | $900/mo |
| Large (dedicated) | 8+ | $100-250 | $270-330 | $50-100 | $680 | Custom |
Gross Margins
| Tier | Revenue | Cost | Gross Margin |
|---|---|---|---|
| Small @ $200/mo (BYOK) | $200 | $50 | 75% |
| Medium @ $500/mo (BYOK) | $500 | $90 | 82% |
| Medium @ $900/mo (Speedrun keys) | $900 | $210 | 77% |
| Large @ $1,500/mo | $1,500 | $680 | 55-72% |
Target: 65-80% gross margins. Achievable with BYOK as default.
Cost Optimization Levers
- Model selection optimization — route tasks to cheapest adequate model (40-50% LLM savings)
- Prompt caching — 90% discount on repeated context (Anthropic)
- Client BYOK — clients bring own LLM keys, Speedrun's variable cost drops 60-80%
- Batch API — 50% discount for non-urgent tasks (quality evaluation, knowledge consolidation)
Scale Economics
| Stage | Tenants | Total Cost/mo | Per Tenant |
|---|---|---|---|
| Stage 0 (MVP) | 1 (internal) | $1,350 | — |
| Stage 1 | 10 | $2,100-2,600 | $210-260 |
| Stage 2 | 50 | $7,100-12,100 | $142-242 |
| Stage 3 | 200 | $31,700-56,700 | $159-284 |
Cost scales with usage (variable-heavy), not ahead of it. No cliff edges.
8. Non-Functional Assessment
Security
- Tenant isolation — cell architecture, namespace separation, VPC deployment option
- Secret isolation — services hold only the keys they need; runtime holds zero secrets
- Data classification — knowledge entries tagged "safe for LLM" vs "internal only"
- Instruction hierarchy — system > skill > knowledge > user > tool output (prompt injection defense)
- Capability manifests — agents can only invoke declared tools, knowledge domains, and communication targets
- Provenance chain — every knowledge entry traces to source with consent classification
- Deterministic safety — budget limits, error thresholds, permissions enforced by code, not AI reasoning
Scalability
Metric-triggered scaling, not pre-optimization:
- pgvector p95 >200ms → add Qdrant
- Direct calls hitting throughput limit → add NATS
- Agent code never changes when infrastructure scales
Deployment
- Cloud-agnostic — containerized, IaC, deployable on any cloud or on-premises
- Two modes: Agency (multi-tenant SaaS) and Customer VPC (single-tenant, data stays in client boundary)
- Current MVP: GitHub Actions → Tailscale SSH → Docker on EC2
9. Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Complexity overwhelming small team | High | High | Modular monolith start, extract services only when needed |
| Non-determinism makes debugging hard | High | Medium | Immutable versioning of everything, full execution traces |
| Evaluation accuracy insufficient for autonomy | Medium | High | Start with measurable tasks, multi-signal evaluation |
| Supervisor agents unreliable (AI supervising AI) | Medium | High | Deterministic circuit breakers, governance layer last to autonomy |
| Prompt injection manipulates agents | High | High | Instruction hierarchy + output scanning (no complete solution exists) |
| Client data cross-pollination (legal) | High | Critical | Default isolation, tiered consent model, provenance classification |
| Frontier labs ship managed agent OS | Very High | Medium | Build thin on commodity, deep on boundaries (see Competitive Positioning) |
10. Competitive Positioning
Why Frontier Labs Won't Build What Kaze Builds
Frontier labs (Anthropic, OpenAI, Google) will commoditize generic agent infrastructure (execution loops, scheduling, basic memory). They will NOT build:
| Kaze Capability | Why Labs Won't |
|---|---|
| Multi-provider LLM Gateway | Anthropic won't route to OpenAI, and vice versa |
| Cell isolation + VPC deployment | Their model is centralized SaaS |
| Data classification + provenance | They want data flowing through their models, not gated |
| BYOK across providers | They want clients on their keys |
| Domain-calibrated supervision ramp | They build horizontal, not vertical |
| Vertical knowledge flywheel | They sell tools, not outcomes |
Kaze gets more valuable as frontier labs get more powerful. More powerful AI on client data creates MORE need for sovereignty, provider independence, budget controls, audit trails, and graduated trust.
Historical Parallels
| Generic platform | Boundary/compliance layer that thrived |
|---|---|
| AWS/GCP/Azure | Snowflake, Databricks (data governance + multi-cloud) |
| Public cloud | HashiCorp (multi-cloud abstraction + security) |
| LLM APIs | AI gateways (Portkey, Helicone — routing, compliance) |
| Stripe (payments) | Plaid (financial data boundaries) |
| Salesforce (CRM) | Veeva (vertical CRM with pharma compliance) |
Pattern: Generic platforms commoditize execution. Boundary-enforcement and vertical-expertise layers capture value on top.
Build Thin vs Build Deep
| Build thin (use commodity) | Build deep (this is the moat) |
|---|---|
| Agent execution loop | Multi-provider gateway with BYOK + budget |
| Basic scheduling | Data classification and compliance boundaries |
| Generic tool wrappers | Cell architecture with VPC deployment |
| Conversation persistence | Supervision ramp calibrated per domain |
| Single-agent memory | Cross-agent knowledge with provenance + ABAC |
| Vertical skills and domain expertise |
11. Current Status & Roadmap
What's Built (Core Platform)
| Component | Repo | Status |
|---|---|---|
| Agent Runtime | kaze-runtime | Implemented — two-layer agent model, YAML+TS skills, HTTP dispatch, supervision ramp |
| LLM Gateway | kaze-gateway | Implemented — Vercel AI SDK, multi-provider (Gemini/Claude), Langfuse observability |
| Knowledge Service | kaze-knowledge | Implemented — Mem0 + pgvector, fact extraction, per-agent episodic memory |
| Internal Ops (V0) | kaze-agent-ops | In progress — GitHub skill operational |
| CI/CD | All repos | GitHub Actions → Tailscale SSH → Docker on EC2 |
What's Next
- Additional V0 skills — Calendar, Research, Project Management, Documentation
- V1 SEO vertical — Keyword Research, Content Optimization, Technical Audit, Reporting
- V2 Toddle vertical — Content Enrichment, Data Quality, Recommendation Tuning
- Task Scheduler — Cron + event triggers for automated workflows
- Shared knowledge tier — Quality gates, ABAC, cross-agent knowledge
- Self-improvement loop — Quality monitoring, prompt optimization, canary deployment
Parallel Team Structure
Lead: Foundation Platform + V0 Internal Ops (dogfooding)
Team 2: V2 Toddle (content enrichment, data quality)
Team 3: V1 SEO (keyword research, content optimization)12. Key Design Decisions
52 design decisions documented (D1-D52). Key decisions:
| # | Decision | Choice |
|---|---|---|
| D6 | LLM provider strategy | Multi-provider, abstracted behind LLM Gateway |
| D7 | Key management | Dual-key (Speedrun keys + client BYOK) |
| D11 | Architecture pattern | Agent-Oriented (Actor + EDA + Cell hybrid) |
| D14 | Supervision model | Per-skill ramp: supervised → sampling → autonomous |
| D18 | Knowledge storage | PostgreSQL + pgvector (+ Apache AGE later) |
| D19 | Per-agent memory | Mem0 |
| D30 | MVP knowledge | Mem0 + pgvector only (defer graph DB) |
| D43 | Data rights | Tiered consent model with provenance classification |
| D44 | Scaling strategy | Metric-triggered, not pre-optimized |
| D46 | Service topology | Three repos: gateway, runtime, knowledge (secret isolation) |
| D47 | LLM SDK | Vercel AI SDK |
| D51 | Vector store | PostgreSQL + pgvector via LangChain adapter |
Full log: decisions.md
13. Open Questions
| # | Question | Impact |
|---|---|---|
| Q5 | Cross-channel context management (unified thread model) | Medium — UX |
| Q6 | Supervision queue UX (how ops reviews agent outputs) | Medium — operational efficiency |
| Q8 | Billing model (per-agent, per-task, subscription?) | Medium — business model |
| Q9 | Cross-cell agent communication | Low (Phase 3) |
| Q10 | Canary deployment for agent improvements | Medium — safety |
Source Documents
All source material lives in the kaze repo:
- architecture/overview.md — Vision, principles, system architecture
- strategy/product-strategy.md — Verticals, supervision ramp, multi-channel
- architecture/ai-native.md — Self-improvement loop, knowledge graph, agent safety
- architecture/infrastructure.md — Deployment modes, cells, cloud strategy
- architecture/technical-design.md — 6 MVP component designs
- strategy/tradeoffs.md — 9 risks with mitigations
- strategy/decisions.md — D1-D52 design decisions
- strategy/mvp.md — MVP scope, build plan
- research/cost-model.md — Full unit economics
- research/frontier-lab-competitive-analysis.md — Competitive positioning