Skip to content

Design Decisions & Technology Selection

Part of Project Kaze Architecture


1. Design Decisions Log

Record of architectural decisions made, with rationale.

#DecisionChoiceRationaleAlternatives considered
D1Initial deployment modelSelf-hosted, evolving to meshFastest path to first agents operational. Mesh adds value only with multiple cells.Start with mesh (too complex), SaaS-only (limits VPC clients)
D2Cloud strategyCloud-agnostic from day oneSME clients span multiple clouds. Customer VPC must work anywhere.AWS-first with portability later (risks lock-in in architecture)
D3ContainerizationEverything containerized, K8s as universal runtimeK8s is the only compute abstraction that works across all clouds and on-prem.Docker Compose (too simple), Nomad (less ecosystem)
D4IaC approachTerraform/OpenTofu (infra) + K8s manifests with Kustomize (app)Terraform handles cloud-specific provisioning; K8s handles universal app layer.Pulumi (smaller ecosystem), CDK (AWS-locked)
D5Deployment modesAgency (multi-tenant SaaS) + Customer VPC (single-tenant)Must support both for SME market — some trust Speedrun with data, others require sovereignty.SaaS-only (loses enterprise clients), VPC-only (too expensive to operate)
D6LLM provider strategyMulti-provider, abstracted behind LLM GatewayClients have varying provider credits and preferences. Provider competition benefits pricing.Single provider (simpler but limiting)
D7LLM key managementDual-key model (Speedrun keys + client keys)Clients may have their own discounts/credits. Speedrun keys provide baseline.Speedrun-only keys (limits flexibility), client-only keys (no fallback)
D8Inter-agent messagingNATSLightweight, portable, supports pub/sub + request/reply + streaming. Zero cloud deps.Kafka (heavier, better for event sourcing), RabbitMQ (less cloud-native)
D9Observability in customer VPCFull stack in VPC + VPN-in for opsData stays in client boundary. Health beacon out for alerting. VPN for investigation.Centralized telemetry (privacy concern), client self-monitors (support burden)
D10Air-gapped deploymentsOut of scope for nowAdds significant complexity (local models, offline operation). No immediate client demand.Support from day one (too complex)
D11Architecture patternAgent-Oriented (Actor + EDA + Cell hybrid)Agents are intelligent, autonomous entities that don't fit cleanly into any single traditional pattern.Pure microservices (agents aren't services), pure actor model (missing knowledge sharing)
D12AI-native operating modelAI as core operator, humans as governorsDifferentiator from every other agent platform. Enables scaling without linear human headcount.AI as tool (standard SaaS approach, no moat)
D13Vertical strategyDeep vertical-first, not horizontal platformCreates compounding knowledge moat. Easier to prove quality in well-understood domains.Horizontal platform (spread thin, no depth)
D14Supervision modelPer-skill ramp: supervised → sampling → autonomousGranular trust building. A single agent can be autonomous at one skill and supervised at another.Per-agent binary (too coarse), full autonomous from start (too risky)
D15Human interaction modelMulti-channel (Slack, Email, WhatsApp, Telegram)Meet SME clients where they already work. Minimize adoption friction.Dashboard-only (adoption barrier), single channel (too limiting)
D16Primary languageTypeScriptTeam strength, LLM SDK ecosystem, full-stack capability
D17Agent definition modelHybrid (YAML + TypeScript)YAML for structure/config, TypeScript for custom logic. Balances ease of creation with expressiveness.
D18Knowledge storage (Phase 1)PostgreSQL + pgvector + Apache AGEOne database, already in stack, relational + vector + graph. Minimum ops.
D19Per-agent memoryMem0Production-proven, saves dev time, K8s deployable. Evaluate custom replacement in Phase 2.
D20KG construction pipelineCognee (incremental) + GraphRAG (bulk)Cognee for ongoing updates with feedback loops. GraphRAG for initial vertical onboarding from documents.
D21Retrieval strategyTri-factor + graph traversal + agent-initiatedLayered approach: baseline ranking + structured traversal + agent autonomy.
D22Knowledge versioningGit-inspired commits with provenanceEvery write is versioned with agent identity, timestamp, source. Letta pattern.
D23Knowledge access controlPrivate/Shared tiers with ABACClient isolation + vertical sharing. Collaborative Memory pattern.
D24IaC toolingOpenTofu + KustomizeOpen source, cloud-agnostic provisioning + overlay-based K8s configuration.
D25Container registryGitHub Container RegistryIntegrated with GitHub Actions workflow.
D26TestingJestTeam familiarity, mature TypeScript testing ecosystem.
D27First vertical (testbed)Internal Ops (Vertical 0)Dogfooding — fastest feedback loop, no external coordination. Exercises every platform component before external verticals consume it.
D28Communication layerOpenClawMature conversation management and tool routing. Build agent/memory design on top rather than reinventing the conversation interface.
D29MVP messagingDirect calls (no NATS)NATS adds operational complexity. Start with direct function calls between agents. Migrate to NATS when scale demands it.
D30MVP knowledge storeMem0 + pgvector only (defer Apache AGE)Full graph database is not needed for MVP. Vector search + per-agent memory covers initial needs. Add graph when cross-vertical knowledge sharing becomes a real requirement.
D31Build structureParallel teams: Lead (Foundation + V0), Team 2 (Toddle), Team 3 (SEO)Different people own different verticals. Foundation lead also dogfoods as V0 owner, surfacing issues before external teams hit them.
D32Skill definition formatYAML schema (*.skill.yaml) + optional TypeScript handlerYAML for declarative structure (inputs, outputs, tools, quality criteria, supervision ramp). TS for custom logic. Parsed into SkillDefinition at load time.
D33Agent runtime engineAgentRuntime.spawn() / dispatch() / shutdown()Actor-based lifecycle. Agents are long-lived, stateful, process one task at a time. TaskRequest → TaskHandle → TaskResult.
D34Inter-agent messagingMessageTransport interface with AgentMessage envelopeDirectCallTransport for MVP (in-process Map routing). Same message shape serializes to NATS in Phase 2. Zero agent code changes on migration.
D35LLM model hintsfast / balanced / best / embed / judge → per-tenant model mappingAgents request quality level, gateway resolves to concrete model. Decouples agent logic from model selection.
D36Budget enforcementPre-request estimate + hard stop + post-request actual updateFOR UPDATE SKIP LOCKED for distributed safety. Deterministic code check, not AI reasoning.
D37Observation loggingFire-and-forget with batched writes (100 events or 1s flush)18 structured event types. Automatic middleware on LLM Gateway, Tool Executor, Knowledge Client. Never blocks agent execution.
D38Agent privilege modelCapability manifests (whitelist per agent)Agents declare tools + knowledge domains + communication targets. Runtime enforces — agent cannot discover or invoke anything outside its manifest.Open access with per-vertical filtering (too permissive)
D39Prompt injection defenseInstruction hierarchy + output validationSystem > skill > knowledge > user > tool output. Sensitive operations require deterministic validation, not just agent reasoning. No complete solution exists — defense in depth.Input sanitization only (insufficient), fine-tuned classifiers (premature)
D40Shared knowledge integrityQuarantine + multi-signal quality gateWrites to shared tier enter quarantine before visibility. Quality gate uses LLM-as-judge + cross-reference + source verification. Not LLM-only.Immediate publish (risky), human review for all (doesn't scale)
D41LLM data classificationPer-entry tags controlling LLM exposureKnowledge entries tagged "safe for LLM" or "internal only." Gateway respects tags before including in prompts. Sensitive data routed to zero-retention providers or local models.No classification (sends everything), blanket redaction (loses utility)
D42Credential lifecycleAutomated rotation + anomaly detection + blast radius containmentVault-managed rotation on schedule. Usage spikes trigger alert + auto-freeze. One compromised key affects only that client's agents.Manual rotation (error-prone), no anomaly detection (slow response)
D43Knowledge provenance & data rightsTiered consent model with provenance classificationDefault: strict isolation (no client data in shared knowledge). Opt-in contributor tier with consent addendum. Every entry tagged with source class (public, speedrun_internal, speedrun_research, client_contributed, client_private). ABAC enforces visibility. Protects against trade secret claims and GDPR purpose limitation. See research/data-rights-knowledge-sharing.md.Strict isolation only (kills flywheel), consent + anonymization only (re-identification risk), aggregate-only (shallow knowledge)
D44Scaling strategyMetric-triggered scaling over pre-optimizationNo pre-sharding, no NATS at MVP, no Qdrant at MVP. Each scaling action has a concrete metric trigger (e.g., pgvector p95 >200ms → add Qdrant). Scale vertically first, horizontally when vertical limit hit. Agent code never changes when infrastructure scales. See research/scalability-model.md.Pre-optimized architecture (complexity too early), manual scaling decisions (reactive, error-prone)
D45Cost optimization strategyBYOK-first + model selection optimization + tiered pricingClient BYOK as default (drops Speedrun's LLM cost to ~$0). Model selection routing (cheapest model meeting quality bar, 40-50% savings). Tiered subscription pricing (Starter/Growth/Enterprise). Prompt caching for 90% discount on repeated context. Batch API for background tasks. See research/cost-model.md.Flat per-task pricing (too complex for SMEs), Speedrun-key-only (thin margins), single model tier (wasteful)
D46Service topologyThree separate repos/services: gateway, runtime, knowledgeSecret isolation (gateway holds LLM+tool keys, knowledge holds own LLM key, runtime holds zero secrets). Independent scaling and deployment.Monorepo (simpler but secrets leak across concerns), gateway+knowledge merged (couples LLM routing with memory)
D47LLM SDKVercel AI SDK (ai package)Unified interface for Gemini + Claude. Built-in tool-use loop, streaming, structured output. Actively maintained.Direct provider SDKs (more boilerplate), LangChain (heavier than needed for gateway)
D48Observability (MVP)Langfuse (hosted SaaS)LLM trace visualization, cost tracking, prompt management. Free tier sufficient for MVP. Integrated in gateway via Vercel AI SDK.Custom Observation Logger (more work), no observability (blind)
D49Knowledge service LLM keyDedicated key, not shared with gatewayKnowledge service needs LLM for Mem0 fact extraction + Google embeddings. Own key avoids coupling to gateway for secrets.Share gateway key via env (coupling), route through gateway (unnecessary latency)
D50Embedding modelGoogle gemini-embedding-001 (768 dims)Free tier available, good quality, native Mem0 support.OpenAI ada-002 (paid only), Cohere (less Mem0 integration)
D51MVP vector storePostgreSQL + pgvector via LangChain adapterMatches D18 (PostgreSQL for knowledge storage). Mem0 connects via LangChain PGVectorStore wrapper. Single database for vectors + future shared knowledge.Qdrant (separate infra), SQLite (not production-grade), Mem0 built-in memory store (no persistence guarantees)
D52CI/CD (MVP)GitHub Actions + Tailscale SSH deploy to EC2Simple, no K8s overhead for MVP. Tailscale provides secure private networking. Docker containers on EC2.K8s (too complex for 3 services), manual deploy (error-prone)

2. Technology Selection

2.1 Core Stack

ConcernChoiceRationale
Primary languageTypeScriptTeam strength, rich LLM SDK ecosystem, full-stack (platform + dashboard + agents in one language)
Secondary languagePython (scripting)Some developers prefer it for scripting tasks, strong AI/data library ecosystem
Agent runtime modelHybrid (YAML structure + TypeScript code)YAML defines agent structure (skills, knowledge deps, config). TypeScript for custom logic. Balances configurability with flexibility.
Package managerTo be determined
TestingJestTeam familiarity, mature ecosystem

2.2 Infrastructure & DevOps

ConcernChoiceRationale
Container orchestrationKubernetesUniversal compute abstraction, cloud-agnostic (decided in architecture)
IaC (infrastructure)OpenTofuOpen source Terraform fork, cloud-agnostic provisioning
IaC (application)KustomizeBase + overlay model for environment-specific configs without templating complexity
GitOpsArgoCD or FluxContinuous delivery to K8s, enables remote upgrades of customer VPC deployments
Container registryGitHub Container RegistryIntegrated with GitHub workflows, no additional infrastructure
CI/CDGitHub ActionsIntegrated with GHCR and GitOps pipeline

2.3 Knowledge System

Detailed research documented in research/knowledge-system.md.

Storage Layer (Phased)

PhaseChoiceRationale
Phase 1 (MVP)PostgreSQL + pgvector + Apache AGEOne database for relational + vector + graph. Already in our stack. Minimum ops burden. Good enough for early-to-mid scale.
Phase 2 (if vector bottleneck)Add QdrantBest performance for pure vector search. Complements Postgres for hot-path queries.
Phase 3 (if graph bottleneck)Add FalkorDB or evaluate SurrealDBFalkorDB for dedicated graph performance. SurrealDB as potential single-system consolidation.

Knowledge System Components

ComponentChoiceRationale
Memory typesEpisodic + Semantic + Procedural + ReflectiveAligned with CoALA framework and MIRIX taxonomy. Proven across academic literature.
Per-agent memoryMem0Production-proven (186M API calls/quarter), handles agent-level working + episodic memory. Saves development time. K8s deployable.
KG construction (incremental)CogneeAutomated knowledge graph building from documents. Feedback loops for quality improvement. Pluggable backends.
KG construction (bulk)Microsoft GraphRAGPipeline for converting large document collections into structured knowledge graphs. MIT licensed. For initial vertical onboarding.
Retrieval strategyTri-factor scoring + graph traversal + agent-initiated searchLayered: Generative Agents formula as baseline, AriGraph-style graph traversal for structured knowledge, MemGPT-style agent-initiated for autonomy.
Versioning modelGit-inspired (Letta Context Repositories pattern)Every knowledge write is a versioned commit with agent identity, timestamp, source attribution. Branchable, mergeable, diffable.
Access controlPrivate/Shared tiers with ABACCollaborative Memory pattern. Private tier for agent/client-specific. Shared tier for vertical knowledge. Attribute-based policies for flexible permissions.
Quality gatesVerification before shared knowledge entryVoyager self-verification pattern. New knowledge is verified/reviewed before entering the shared store.

Knowledge Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│              KAZE KNOWLEDGE SYSTEM                        │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │          KNOWLEDGE LAYER (TypeScript)                │ │
│  │                                                      │ │
│  │  Memory Type Router (MIRIX-inspired)                 │ │
│  │    ├── Episodic: events, logs, history               │ │
│  │    ├── Semantic: facts, graph, relationships         │ │
│  │    ├── Procedural: skills, how-to, code              │ │
│  │    └── Reflective: insights, learnings               │ │
│  │                                                      │ │
│  │  Retrieval Engine                                    │ │
│  │    ├── Tri-factor scoring (recency+importance+rel.)  │ │
│  │    ├── Graph traversal (knowledge graph)             │ │
│  │    ├── Spreading activation (linked notes)           │ │
│  │    └── Agent-initiated search (tool calls)           │ │
│  │                                                      │ │
│  │  Write Pipeline                                      │ │
│  │    ├── Provenance tagging (AriGraph-inspired)        │ │
│  │    ├── Version control (Letta-inspired)              │ │
│  │    ├── Quality gate (Voyager self-verification)      │ │
│  │    └── Access control (Collaborative Memory ABAC)    │ │
│  │                                                      │ │
│  │  Consolidation Engine                                │ │
│  │    ├── Episodic → Semantic distillation              │ │
│  │    ├── Reflection synthesis                          │ │
│  │    ├── Contradiction detection                       │ │
│  │    └── Importance-based retention                    │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  STORAGE: PostgreSQL + pgvector + Apache AGE         │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  CONSTRUCTION: Cognee (incremental) + GraphRAG (bulk)│ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  PER-AGENT MEMORY: Mem0                              │ │
│  └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

3. Open Questions

Questions that require further exploration before implementation.

Resolved

#QuestionResolution
Q1What are the first verticals to build?Resolved (D27-D31). Three verticals in parallel: Vertical 0 (Internal Ops), Vertical 1 (SEO), Vertical 2 (Toddle). Internal Ops is the primary testbed. OpenClaw as communication layer. Parallel team structure. See mvp.md.
Q2What backs the knowledge graph?Resolved (D18-D23). PostgreSQL + pgvector + Apache AGE for Phase 1. Mem0 for per-agent memory. Cognee + GraphRAG for KG construction. Git-inspired versioning. ABAC access control. See Section 2.3 and research/knowledge-system.md.
Q3What language/stack for the platform?Resolved (D16-D17). TypeScript primary, Python for scripting. Hybrid agent definition (YAML + TypeScript). See Section 2.1.
Q7How is vertical knowledge curated and versioned?Resolved (D22-D23). Git-inspired versioned commits with provenance. Quality gates before shared knowledge entry. ABAC access control. See Section 2.3.
Q4What is the agent runtime contract?Resolved (D32-D37). YAML skill/agent definitions + TypeScript SkillHandler interface. Actor-based AgentRuntime with spawn/dispatch/shutdown lifecycle. MessageTransport abstraction (DirectCallTransport MVP, NatsTransport Phase 2). See technical-design.md.

Open

#QuestionContextImpact
Q5How does the Conversation Manager maintain cross-channel context?Unified thread model, message deduplication, channel-specific formatting.Medium — UX-critical
Q6What does the supervision queue UX look like?How do Speedrun ops review, approve, and correct agent outputs efficiently?Medium — operational efficiency
Q8What is the billing model?Per-agent, per-task, per-token, subscription? Affects LLM Gateway design.Medium — business model
Q9How do you handle agent-to-agent communication across cells?Message format, discovery, auth, latency tolerance.Low (Phase 3) — mesh feature
Q10What is the canary deployment mechanism for agent improvements?Traffic splitting, A/B infrastructure, rollback triggers.Medium — self-improvement safety