Skip to content

Kaze — Product Overview

Comprehensive product brief for Project Kaze by Speedrun Ventures.


1. Product Overview

Kaze is an operating system for AI agents built by Speedrun Ventures. It enables defining, orchestrating, and operating fleets of AI agents that automate business operations for SME clients.

Kaze is not another LLM wrapper or chatbot platform. It is an AI-native system where AI is the core operating layer — AI monitors AI, AI improves AI, and human involvement is minimized to governance and exception handling.

Core thesis: SMEs need outcomes, not tools. "Your SEO is handled" is fundamentally different from "here's an API." Kaze delivers automated business operations, not developer infrastructure.

Business model: Tiered SaaS subscription. Clients get fleets of domain-expert AI agents that handle their business operations across SEO, content, data quality, project management, and more. Agents learn from every interaction, building compounding vertical knowledge.


2. What We're Solving

The Problem

SMEs are caught between two bad options:

  1. Hire specialists — expensive ($5-15k/month per role), hard to find, slow to onboard, limited by human throughput
  2. Use generic AI tools — ChatGPT, Copilot, etc. are powerful but require expertise to use, produce inconsistent results, don't learn, and aren't integrated into business workflows

No existing solution gives SMEs access to persistent, domain-expert AI agents that operate autonomously within their business context, learn over time, and deliver consistent results across channels (Slack, Email, WhatsApp).

Why Now

  • Frontier models have reached the quality bar. Claude, Gemini, and GPT-4 can now reliably perform complex reasoning, tool use, and multi-step workflows — the foundation for autonomous agents.
  • Cost has collapsed. A full agent task costs $0.02–$0.17 fully loaded. An agent handling 10 tasks/day costs $10–30/month in LLM costs. This is 100-500x cheaper than a human specialist.
  • Tool ecosystems are maturing. MCP, function calling, and API standards make it tractable to build agents that interact with real business systems.
  • Trust mechanisms are emerging. Per-skill supervision ramps, quality monitoring, and deterministic safety boundaries make it possible to gradually hand autonomy to agents — not all-or-nothing.

What We Uniquely Solve

Existing solutionsGap Kaze fills
ChatGPT / Claude (horizontal AI)No persistence, no domain expertise, no business integration, no autonomy management
Zapier / Make (workflow automation)No intelligence, no learning, brittle rules, can't handle ambiguity
Custom AI developmentToo expensive for SMEs ($50-200k+), long timelines, no reusable platform
AI agent frameworks (LangChain, CrewAI)Developer tools, not business solutions. No vertical expertise, no multi-tenant, no compliance

3. Product Architecture

Design Philosophy

Agent-Oriented Architecture — a hybrid of Actor Model (autonomous entities with private state), Event-Driven Architecture (loose coupling via async events), Microservices (independent deployment), and Cell-Based Architecture (isolated deployments per tenant/VPC).

What's new to Kaze (no traditional equivalent):

  • Components that learn and self-modify their behavior over time
  • A governance hierarchy where AI agents supervise other AI agents
  • Shared knowledge across agents while maintaining runtime isolation
  • A supervision ramp (supervised → sampling → autonomous) as a trust model

5-Layer Architecture

Layer 3: GOVERNANCE & SELF-IMPROVEMENT
  Supervisor Agents · Quality Monitor · Improvement Agent

Layer 2: ORCHESTRATION & KNOWLEDGE
  Orchestrator Agents · Shared Knowledge Graph (per-vertical + cross-vertical + per-client)

Layer 1: EXECUTION
  Agent Skills (composable, reusable per vertical): Keyword Research, Content Optimize, etc.

Layer 0.5: INTERACTION
  Conversation Manager → Slack · Email · WhatsApp · Telegram

Layer 0: PLATFORM INFRASTRUCTURE
  LLM Gateway · PostgreSQL + pgvector · Mem0 · Vault · Observability

Three-Service Topology (Implemented)

┌─ kaze-gateway (port 4200) ──────────────────────────────┐
│  LLM calls (Vercel AI SDK → Gemini/Claude)               │
│  Tool execution (credential injection → external APIs)   │
│  Holds: all LLM keys, tool API tokens                    │
│  Observability: Langfuse tracing                         │
└──────────────────────────▲──────────────────────────────┘
                           │ HTTP
┌──────────────────────────┴──────────────────────────────┐
│  kaze-runtime (port 4100)                               │
│  VerticalAgent → SubAgent (per-task, per-skill)         │
│  Memory: search before LLM call, store after LLM call   │
│  Zero secrets — pure orchestration                      │
└──────┬───────────────────────────────────▲──────────────┘
       │ HTTP                              │ HTTP
┌──────▼───────────────────────────────────┴──────────────┐
│  kaze-knowledge (port 4300)                             │
│  Per-agent episodic memory (Mem0 + pgvector)            │
│  LLM fact extraction + vector similarity search         │
│  Own LLM key — independent of gateway                   │
└─────────────────────────────────────────────────────────┘

Secret isolation principle: Gateway holds LLM + tool keys, Knowledge holds its own LLM key for fact extraction, Runtime holds zero secrets. No single service compromise exposes all credentials.


4. Core Product Capabilities

4.1 Agent Skills — The Composable Unit

Skills are the atomic reusable unit of agent capability. YAML-defined with optional TypeScript handlers.

yaml
skill: keyword-research
  inputs: [business_context, current_rankings, competitors]
  tools_required: [semrush_api, google_search_console, llm]
  outputs: [keyword_opportunities, priority_ranking, reasoning]
  knowledge_dependencies: [seo/domain-concepts, seo/best-practices]
  quality_criteria: [relevance_score > 0.8, search_volume validation]

An agent is a composition of skills + role + context. Skills transfer across verticals where applicable.

4.2 Supervision Ramp

The transition from human control to agent autonomy happens in three phases, configured per skill × client × risk level:

PhaseWhat happensSignal
SupervisedAgent works, human reviews every output. Corrections feed back into learning.Building training data
SamplingRandom 10-20% gets human review. Quality score maintained. Auto-rollback if quality drops.Statistical confidence
AutonomousAI quality check on all outputs. Auto-delivers unless confidence below threshold. Escalates only exceptions.Self-correcting

Example: An SEO agent might simultaneously be autonomous at keyword research (measurable), sampling on content optimization (subjective), and supervised on client communication (high-stakes).

4.3 Multi-Channel Interaction

Agents meet humans where they are — Slack, Email, WhatsApp, Telegram. No dashboards for end-users.

Slack #seo-updates: Agent: "Found 12 new keyword opportunities. Top 3: [X, Y, Z] — ~5k/mo combined traffic. Drafted content briefs. Proceed?" Human: "Skip Z, we dropped that product." Agent: "Got it — I'll remember that. Proceeding with X and Y."

That correction feeds back into the client knowledge graph automatically.

4.4 Knowledge System

Per-agent episodic memory via Mem0 (implemented), with shared vertical knowledge via pgvector (planned):

Knowledge tierWhat it storesIsolation
Per-agent (private)Conversation history, task outcomes, client preferencesAgent-scoped
Vertical (shared)Domain expertise, best practices, tool knowledgeShared across clients in a vertical
Cross-verticalBusiness operations patterns, communication practicesPlatform-wide
Client-specificBrand voice, industry quirks, preferences, historyNever leaves client boundary

4.5 LLM Gateway

Multi-provider abstraction (Gemini, Claude, OpenAI, local models). Agents never hold API keys.

  • Model hints — agents request quality level (fast/balanced/best), gateway resolves to concrete model
  • Dual-key BYOK — clients can bring their own LLM keys, dropping Speedrun's variable cost to near-zero
  • Tool execution — credential injection at runtime, agents never see raw tokens
  • Langfuse observability — every LLM call traced, cost tracked, latency measured

5. Verticals & Portfolio

The Kaze Flywheel

Pick a vertical → Encode expertise into skills → Deploy with supervision
→ Quality loop (supervised → sampling → autonomous) → Agents build knowledge graph
→ Apply to new clients (knowledge transfers, agents get smarter) → Repeat

Each vertical makes the platform smarter, not just individual agents. The moat is accumulated vertical knowledge graphs and proven agent skills.

Active & Planned Verticals

VerticalStatusAgentsPortfolio Project
V0: Internal OpsActive (testbed)Research, PM, Issue Tracking, Scheduling, DocsSpeedrun's own operations
V1: SEO AutomationPlannedKeyword Research, Content Optimization, Technical Audit, ReportingSEO clients
V2: Toddle EnrichmentPlannedContent Enrichment, Data Quality, Recommendation Tuningtoddle.sg
PunkgaFutureContent moderation, artist support, communitypunkga.me
TrueSightFutureTBDtruesight.trade

Why V0 Internal Ops First

  • Dogfooding — Speedrun is the first client. Every pain point we feel, our clients will feel.
  • Fast feedback — No external client coordination. Iterate in hours, not weeks.
  • Foundation testbed — Every platform component gets exercised before external verticals use it.

6. Self-Improvement / Monitoring Loop

AI Monitors AI

The first responder for system health is not a human looking at dashboards — it's an AI agent.

  • Health Monitor Agent — Watches fleet health, detects failures, restarts stuck agents, takes corrective action
  • Cost Monitor Agent — Tracks token spend, detects anomalies, throttles agents proactively
  • Quality Monitor Agent — Evaluates outputs for quality, catches hallucinations, scores task completion

Hard circuit breakers remain deterministic code (budget limits, error rates, permissions). AI supervision augments, never replaces safety-critical rules.

AI Improves AI

Every execution produces signals feeding a continuous improvement cycle:

LayerWhat improvesHow
PromptsSystem prompts, few-shot examplesA/B testing, quality measurement, auto-selecting winners
Tool usageTool selection, call orderAnalyzing successful vs failed runs
OrchestrationWorkflow structure, parallelismBottleneck identification, step reordering
Model selectionWhich LLM for which taskCost vs quality tracking, auto-routing to cheapest adequate model
KnowledgeWhat context agents receiveLearning which knowledge is useful, pruning noise

All self-improvements are versioned, canaried (10% traffic), and reversible. No agent modifies itself for all traffic simultaneously.


7. Unit Economics & Cost Structure

Cost Breakdown

VARIABLE (60-80%): LLM Tokens (dominant) → External APIs → Embeddings
SEMI-FIXED:        Compute / K8s → Database → Storage
FIXED:             Control plane → CI/CD → Monitoring

Key insight: LLM token cost dominates everything. A 20% reduction in tokens per task saves more money than halving infrastructure costs.

Cost per Task (Fully Loaded)

Task TypeLLMComputeToolsTotal
Simple extraction$0.011$0.002$0~$0.013
Keyword research$0.042$0.005$0.05~$0.10
Content optimization$0.063$0.005$0.02~$0.09
Research synthesis$0.084$0.008$0~$0.09
Technical audit$0.105$0.010$0.05~$0.17

Most tasks cost $0.02–$0.17 fully loaded. 100-500x cheaper than human specialists.

Cost per Tenant per Month

TypeAgentsLLMInfraToolsTotalSuggested Price
Small (BYOK)3$0$20-35$0-20$50$200/mo
Medium (BYOK)5$0$25-40$20-50$90$500/mo
Medium (Speedrun keys)5$50-120$25-40$20-50$210$900/mo
Large (dedicated)8+$100-250$270-330$50-100$680Custom

Gross Margins

TierRevenueCostGross Margin
Small @ $200/mo (BYOK)$200$5075%
Medium @ $500/mo (BYOK)$500$9082%
Medium @ $900/mo (Speedrun keys)$900$21077%
Large @ $1,500/mo$1,500$68055-72%

Target: 65-80% gross margins. Achievable with BYOK as default.

Cost Optimization Levers

  1. Model selection optimization — route tasks to cheapest adequate model (40-50% LLM savings)
  2. Prompt caching — 90% discount on repeated context (Anthropic)
  3. Client BYOK — clients bring own LLM keys, Speedrun's variable cost drops 60-80%
  4. Batch API — 50% discount for non-urgent tasks (quality evaluation, knowledge consolidation)

Scale Economics

StageTenantsTotal Cost/moPer Tenant
Stage 0 (MVP)1 (internal)$1,350
Stage 110$2,100-2,600$210-260
Stage 250$7,100-12,100$142-242
Stage 3200$31,700-56,700$159-284

Cost scales with usage (variable-heavy), not ahead of it. No cliff edges.


8. Non-Functional Assessment

Security

  • Tenant isolation — cell architecture, namespace separation, VPC deployment option
  • Secret isolation — services hold only the keys they need; runtime holds zero secrets
  • Data classification — knowledge entries tagged "safe for LLM" vs "internal only"
  • Instruction hierarchy — system > skill > knowledge > user > tool output (prompt injection defense)
  • Capability manifests — agents can only invoke declared tools, knowledge domains, and communication targets
  • Provenance chain — every knowledge entry traces to source with consent classification
  • Deterministic safety — budget limits, error thresholds, permissions enforced by code, not AI reasoning

Scalability

Metric-triggered scaling, not pre-optimization:

  • pgvector p95 >200ms → add Qdrant
  • Direct calls hitting throughput limit → add NATS
  • Agent code never changes when infrastructure scales

Deployment

  • Cloud-agnostic — containerized, IaC, deployable on any cloud or on-premises
  • Two modes: Agency (multi-tenant SaaS) and Customer VPC (single-tenant, data stays in client boundary)
  • Current MVP: GitHub Actions → Tailscale SSH → Docker on EC2

9. Risks & Mitigations

RiskLikelihoodImpactMitigation
Complexity overwhelming small teamHighHighModular monolith start, extract services only when needed
Non-determinism makes debugging hardHighMediumImmutable versioning of everything, full execution traces
Evaluation accuracy insufficient for autonomyMediumHighStart with measurable tasks, multi-signal evaluation
Supervisor agents unreliable (AI supervising AI)MediumHighDeterministic circuit breakers, governance layer last to autonomy
Prompt injection manipulates agentsHighHighInstruction hierarchy + output scanning (no complete solution exists)
Client data cross-pollination (legal)HighCriticalDefault isolation, tiered consent model, provenance classification
Frontier labs ship managed agent OSVery HighMediumBuild thin on commodity, deep on boundaries (see Competitive Positioning)

10. Competitive Positioning

Why Frontier Labs Won't Build What Kaze Builds

Frontier labs (Anthropic, OpenAI, Google) will commoditize generic agent infrastructure (execution loops, scheduling, basic memory). They will NOT build:

Kaze CapabilityWhy Labs Won't
Multi-provider LLM GatewayAnthropic won't route to OpenAI, and vice versa
Cell isolation + VPC deploymentTheir model is centralized SaaS
Data classification + provenanceThey want data flowing through their models, not gated
BYOK across providersThey want clients on their keys
Domain-calibrated supervision rampThey build horizontal, not vertical
Vertical knowledge flywheelThey sell tools, not outcomes

Kaze gets more valuable as frontier labs get more powerful. More powerful AI on client data creates MORE need for sovereignty, provider independence, budget controls, audit trails, and graduated trust.

Historical Parallels

Generic platformBoundary/compliance layer that thrived
AWS/GCP/AzureSnowflake, Databricks (data governance + multi-cloud)
Public cloudHashiCorp (multi-cloud abstraction + security)
LLM APIsAI gateways (Portkey, Helicone — routing, compliance)
Stripe (payments)Plaid (financial data boundaries)
Salesforce (CRM)Veeva (vertical CRM with pharma compliance)

Pattern: Generic platforms commoditize execution. Boundary-enforcement and vertical-expertise layers capture value on top.

Build Thin vs Build Deep

Build thin (use commodity)Build deep (this is the moat)
Agent execution loopMulti-provider gateway with BYOK + budget
Basic schedulingData classification and compliance boundaries
Generic tool wrappersCell architecture with VPC deployment
Conversation persistenceSupervision ramp calibrated per domain
Single-agent memoryCross-agent knowledge with provenance + ABAC
Vertical skills and domain expertise

11. Current Status & Roadmap

What's Built (Core Platform)

ComponentRepoStatus
Agent Runtimekaze-runtimeImplemented — two-layer agent model, YAML+TS skills, HTTP dispatch, supervision ramp
LLM Gatewaykaze-gatewayImplemented — Vercel AI SDK, multi-provider (Gemini/Claude), Langfuse observability
Knowledge Servicekaze-knowledgeImplemented — Mem0 + pgvector, fact extraction, per-agent episodic memory
Internal Ops (V0)kaze-agent-opsIn progress — GitHub skill operational
CI/CDAll reposGitHub Actions → Tailscale SSH → Docker on EC2

What's Next

  1. Additional V0 skills — Calendar, Research, Project Management, Documentation
  2. V1 SEO vertical — Keyword Research, Content Optimization, Technical Audit, Reporting
  3. V2 Toddle vertical — Content Enrichment, Data Quality, Recommendation Tuning
  4. Task Scheduler — Cron + event triggers for automated workflows
  5. Shared knowledge tier — Quality gates, ABAC, cross-agent knowledge
  6. Self-improvement loop — Quality monitoring, prompt optimization, canary deployment

Parallel Team Structure

Lead:    Foundation Platform + V0 Internal Ops (dogfooding)
Team 2:  V2 Toddle (content enrichment, data quality)
Team 3:  V1 SEO (keyword research, content optimization)

12. Key Design Decisions

52 design decisions documented (D1-D52). Key decisions:

#DecisionChoice
D6LLM provider strategyMulti-provider, abstracted behind LLM Gateway
D7Key managementDual-key (Speedrun keys + client BYOK)
D11Architecture patternAgent-Oriented (Actor + EDA + Cell hybrid)
D14Supervision modelPer-skill ramp: supervised → sampling → autonomous
D18Knowledge storagePostgreSQL + pgvector (+ Apache AGE later)
D19Per-agent memoryMem0
D30MVP knowledgeMem0 + pgvector only (defer graph DB)
D43Data rightsTiered consent model with provenance classification
D44Scaling strategyMetric-triggered, not pre-optimized
D46Service topologyThree repos: gateway, runtime, knowledge (secret isolation)
D47LLM SDKVercel AI SDK
D51Vector storePostgreSQL + pgvector via LangChain adapter

Full log: decisions.md


13. Open Questions

#QuestionImpact
Q5Cross-channel context management (unified thread model)Medium — UX
Q6Supervision queue UX (how ops reviews agent outputs)Medium — operational efficiency
Q8Billing model (per-agent, per-task, subscription?)Medium — business model
Q9Cross-cell agent communicationLow (Phase 3)
Q10Canary deployment for agent improvementsMedium — safety

Source Documents

All source material lives in the kaze repo: