Design Decisions & Technology Selection
Part of Project Kaze Architecture
1. Design Decisions Log
Record of architectural decisions made, with rationale.
| # | Decision | Choice | Rationale | Alternatives considered |
|---|---|---|---|---|
| D1 | Initial deployment model | Self-hosted, evolving to mesh | Fastest path to first agents operational. Mesh adds value only with multiple cells. | Start with mesh (too complex), SaaS-only (limits VPC clients) |
| D2 | Cloud strategy | Cloud-agnostic from day one | SME clients span multiple clouds. Customer VPC must work anywhere. | AWS-first with portability later (risks lock-in in architecture) |
| D3 | Containerization | Everything containerized, K8s as universal runtime | K8s is the only compute abstraction that works across all clouds and on-prem. | Docker Compose (too simple), Nomad (less ecosystem) |
| D4 | IaC approach | Terraform/OpenTofu (infra) + K8s manifests with Kustomize (app) | Terraform handles cloud-specific provisioning; K8s handles universal app layer. | Pulumi (smaller ecosystem), CDK (AWS-locked) |
| D5 | Deployment modes | Agency (multi-tenant SaaS) + Customer VPC (single-tenant) | Must support both for SME market — some trust Speedrun with data, others require sovereignty. | SaaS-only (loses enterprise clients), VPC-only (too expensive to operate) |
| D6 | LLM provider strategy | Multi-provider, abstracted behind LLM Gateway | Clients have varying provider credits and preferences. Provider competition benefits pricing. | Single provider (simpler but limiting) |
| D7 | LLM key management | Dual-key model (Speedrun keys + client keys) | Clients may have their own discounts/credits. Speedrun keys provide baseline. | Speedrun-only keys (limits flexibility), client-only keys (no fallback) |
| D8 | Inter-agent messaging | NATS | Lightweight, portable, supports pub/sub + request/reply + streaming. Zero cloud deps. | Kafka (heavier, better for event sourcing), RabbitMQ (less cloud-native) |
| D9 | Observability in customer VPC | Full stack in VPC + VPN-in for ops | Data stays in client boundary. Health beacon out for alerting. VPN for investigation. | Centralized telemetry (privacy concern), client self-monitors (support burden) |
| D10 | Air-gapped deployments | Out of scope for now | Adds significant complexity (local models, offline operation). No immediate client demand. | Support from day one (too complex) |
| D11 | Architecture pattern | Agent-Oriented (Actor + EDA + Cell hybrid) | Agents are intelligent, autonomous entities that don't fit cleanly into any single traditional pattern. | Pure microservices (agents aren't services), pure actor model (missing knowledge sharing) |
| D12 | AI-native operating model | AI as core operator, humans as governors | Differentiator from every other agent platform. Enables scaling without linear human headcount. | AI as tool (standard SaaS approach, no moat) |
| D13 | Vertical strategy | Deep vertical-first, not horizontal platform | Creates compounding knowledge moat. Easier to prove quality in well-understood domains. | Horizontal platform (spread thin, no depth) |
| D14 | Supervision model | Per-skill ramp: supervised → sampling → autonomous | Granular trust building. A single agent can be autonomous at one skill and supervised at another. | Per-agent binary (too coarse), full autonomous from start (too risky) |
| D15 | Human interaction model | Multi-channel (Slack, Email, WhatsApp, Telegram) | Meet SME clients where they already work. Minimize adoption friction. | Dashboard-only (adoption barrier), single channel (too limiting) |
| D16 | Primary language | TypeScript | Team strength, LLM SDK ecosystem, full-stack capability | |
| D17 | Agent definition model | Hybrid (YAML + TypeScript) | YAML for structure/config, TypeScript for custom logic. Balances ease of creation with expressiveness. | |
| D18 | Knowledge storage (Phase 1) | PostgreSQL + pgvector + Apache AGE | One database, already in stack, relational + vector + graph. Minimum ops. | |
| D19 | Per-agent memory | Mem0 | Production-proven, saves dev time, K8s deployable. Evaluate custom replacement in Phase 2. | |
| D20 | KG construction pipeline | Cognee (incremental) + GraphRAG (bulk) | Cognee for ongoing updates with feedback loops. GraphRAG for initial vertical onboarding from documents. | |
| D21 | Retrieval strategy | Tri-factor + graph traversal + agent-initiated | Layered approach: baseline ranking + structured traversal + agent autonomy. | |
| D22 | Knowledge versioning | Git-inspired commits with provenance | Every write is versioned with agent identity, timestamp, source. Letta pattern. | |
| D23 | Knowledge access control | Private/Shared tiers with ABAC | Client isolation + vertical sharing. Collaborative Memory pattern. | |
| D24 | IaC tooling | OpenTofu + Kustomize | Open source, cloud-agnostic provisioning + overlay-based K8s configuration. | |
| D25 | Container registry | GitHub Container Registry | Integrated with GitHub Actions workflow. | |
| D26 | Testing | Jest | Team familiarity, mature TypeScript testing ecosystem. | |
| D27 | First vertical (testbed) | Internal Ops (Vertical 0) | Dogfooding — fastest feedback loop, no external coordination. Exercises every platform component before external verticals consume it. | |
| D28 | Communication layer | OpenClaw | Mature conversation management and tool routing. Build agent/memory design on top rather than reinventing the conversation interface. | |
| D29 | MVP messaging | Direct calls (no NATS) | NATS adds operational complexity. Start with direct function calls between agents. Migrate to NATS when scale demands it. | |
| D30 | MVP knowledge store | Mem0 + pgvector only (defer Apache AGE) | Full graph database is not needed for MVP. Vector search + per-agent memory covers initial needs. Add graph when cross-vertical knowledge sharing becomes a real requirement. | |
| D31 | Build structure | Parallel teams: Lead (Foundation + V0), Team 2 (Toddle), Team 3 (SEO) | Different people own different verticals. Foundation lead also dogfoods as V0 owner, surfacing issues before external teams hit them. | |
| D32 | Skill definition format | YAML schema (*.skill.yaml) + optional TypeScript handler | YAML for declarative structure (inputs, outputs, tools, quality criteria, supervision ramp). TS for custom logic. Parsed into SkillDefinition at load time. | |
| D33 | Agent runtime engine | AgentRuntime.spawn() / dispatch() / shutdown() | Actor-based lifecycle. Agents are long-lived, stateful, process one task at a time. TaskRequest → TaskHandle → TaskResult. | |
| D34 | Inter-agent messaging | MessageTransport interface with AgentMessage envelope | DirectCallTransport for MVP (in-process Map routing). Same message shape serializes to NATS in Phase 2. Zero agent code changes on migration. | |
| D35 | LLM model hints | fast / balanced / best / embed / judge → per-tenant model mapping | Agents request quality level, gateway resolves to concrete model. Decouples agent logic from model selection. | |
| D36 | Budget enforcement | Pre-request estimate + hard stop + post-request actual update | FOR UPDATE SKIP LOCKED for distributed safety. Deterministic code check, not AI reasoning. | |
| D37 | Observation logging | Fire-and-forget with batched writes (100 events or 1s flush) | 18 structured event types. Automatic middleware on LLM Gateway, Tool Executor, Knowledge Client. Never blocks agent execution. | |
| D38 | Agent privilege model | Capability manifests (whitelist per agent) | Agents declare tools + knowledge domains + communication targets. Runtime enforces — agent cannot discover or invoke anything outside its manifest. | Open access with per-vertical filtering (too permissive) |
| D39 | Prompt injection defense | Instruction hierarchy + output validation | System > skill > knowledge > user > tool output. Sensitive operations require deterministic validation, not just agent reasoning. No complete solution exists — defense in depth. | Input sanitization only (insufficient), fine-tuned classifiers (premature) |
| D40 | Shared knowledge integrity | Quarantine + multi-signal quality gate | Writes to shared tier enter quarantine before visibility. Quality gate uses LLM-as-judge + cross-reference + source verification. Not LLM-only. | Immediate publish (risky), human review for all (doesn't scale) |
| D41 | LLM data classification | Per-entry tags controlling LLM exposure | Knowledge entries tagged "safe for LLM" or "internal only." Gateway respects tags before including in prompts. Sensitive data routed to zero-retention providers or local models. | No classification (sends everything), blanket redaction (loses utility) |
| D42 | Credential lifecycle | Automated rotation + anomaly detection + blast radius containment | Vault-managed rotation on schedule. Usage spikes trigger alert + auto-freeze. One compromised key affects only that client's agents. | Manual rotation (error-prone), no anomaly detection (slow response) |
| D43 | Knowledge provenance & data rights | Tiered consent model with provenance classification | Default: strict isolation (no client data in shared knowledge). Opt-in contributor tier with consent addendum. Every entry tagged with source class (public, speedrun_internal, speedrun_research, client_contributed, client_private). ABAC enforces visibility. Protects against trade secret claims and GDPR purpose limitation. See research/data-rights-knowledge-sharing.md. | Strict isolation only (kills flywheel), consent + anonymization only (re-identification risk), aggregate-only (shallow knowledge) |
| D44 | Scaling strategy | Metric-triggered scaling over pre-optimization | No pre-sharding, no NATS at MVP, no Qdrant at MVP. Each scaling action has a concrete metric trigger (e.g., pgvector p95 >200ms → add Qdrant). Scale vertically first, horizontally when vertical limit hit. Agent code never changes when infrastructure scales. See research/scalability-model.md. | Pre-optimized architecture (complexity too early), manual scaling decisions (reactive, error-prone) |
| D45 | Cost optimization strategy | BYOK-first + model selection optimization + tiered pricing | Client BYOK as default (drops Speedrun's LLM cost to ~$0). Model selection routing (cheapest model meeting quality bar, 40-50% savings). Tiered subscription pricing (Starter/Growth/Enterprise). Prompt caching for 90% discount on repeated context. Batch API for background tasks. See research/cost-model.md. | Flat per-task pricing (too complex for SMEs), Speedrun-key-only (thin margins), single model tier (wasteful) |
| D46 | Service topology | Three separate repos/services: gateway, runtime, knowledge | Secret isolation (gateway holds LLM+tool keys, knowledge holds own LLM key, runtime holds zero secrets). Independent scaling and deployment. | Monorepo (simpler but secrets leak across concerns), gateway+knowledge merged (couples LLM routing with memory) |
| D47 | LLM SDK | Vercel AI SDK (ai package) | Unified interface for Gemini + Claude. Built-in tool-use loop, streaming, structured output. Actively maintained. | Direct provider SDKs (more boilerplate), LangChain (heavier than needed for gateway) |
| D48 | Observability (MVP) | Langfuse (hosted SaaS) | LLM trace visualization, cost tracking, prompt management. Free tier sufficient for MVP. Integrated in gateway via Vercel AI SDK. | Custom Observation Logger (more work), no observability (blind) |
| D49 | Knowledge service LLM key | Dedicated key, not shared with gateway | Knowledge service needs LLM for Mem0 fact extraction + Google embeddings. Own key avoids coupling to gateway for secrets. | Share gateway key via env (coupling), route through gateway (unnecessary latency) |
| D50 | Embedding model | Google gemini-embedding-001 (768 dims) | Free tier available, good quality, native Mem0 support. | OpenAI ada-002 (paid only), Cohere (less Mem0 integration) |
| D51 | MVP vector store | PostgreSQL + pgvector via LangChain adapter | Matches D18 (PostgreSQL for knowledge storage). Mem0 connects via LangChain PGVectorStore wrapper. Single database for vectors + future shared knowledge. | Qdrant (separate infra), SQLite (not production-grade), Mem0 built-in memory store (no persistence guarantees) |
| D52 | CI/CD (MVP) | GitHub Actions + Tailscale SSH deploy to EC2 | Simple, no K8s overhead for MVP. Tailscale provides secure private networking. Docker containers on EC2. | K8s (too complex for 3 services), manual deploy (error-prone) |
2. Technology Selection
2.1 Core Stack
| Concern | Choice | Rationale |
|---|---|---|
| Primary language | TypeScript | Team strength, rich LLM SDK ecosystem, full-stack (platform + dashboard + agents in one language) |
| Secondary language | Python (scripting) | Some developers prefer it for scripting tasks, strong AI/data library ecosystem |
| Agent runtime model | Hybrid (YAML structure + TypeScript code) | YAML defines agent structure (skills, knowledge deps, config). TypeScript for custom logic. Balances configurability with flexibility. |
| Package manager | To be determined | — |
| Testing | Jest | Team familiarity, mature ecosystem |
2.2 Infrastructure & DevOps
| Concern | Choice | Rationale |
|---|---|---|
| Container orchestration | Kubernetes | Universal compute abstraction, cloud-agnostic (decided in architecture) |
| IaC (infrastructure) | OpenTofu | Open source Terraform fork, cloud-agnostic provisioning |
| IaC (application) | Kustomize | Base + overlay model for environment-specific configs without templating complexity |
| GitOps | ArgoCD or Flux | Continuous delivery to K8s, enables remote upgrades of customer VPC deployments |
| Container registry | GitHub Container Registry | Integrated with GitHub workflows, no additional infrastructure |
| CI/CD | GitHub Actions | Integrated with GHCR and GitOps pipeline |
2.3 Knowledge System
Detailed research documented in research/knowledge-system.md.
Storage Layer (Phased)
| Phase | Choice | Rationale |
|---|---|---|
| Phase 1 (MVP) | PostgreSQL + pgvector + Apache AGE | One database for relational + vector + graph. Already in our stack. Minimum ops burden. Good enough for early-to-mid scale. |
| Phase 2 (if vector bottleneck) | Add Qdrant | Best performance for pure vector search. Complements Postgres for hot-path queries. |
| Phase 3 (if graph bottleneck) | Add FalkorDB or evaluate SurrealDB | FalkorDB for dedicated graph performance. SurrealDB as potential single-system consolidation. |
Knowledge System Components
| Component | Choice | Rationale |
|---|---|---|
| Memory types | Episodic + Semantic + Procedural + Reflective | Aligned with CoALA framework and MIRIX taxonomy. Proven across academic literature. |
| Per-agent memory | Mem0 | Production-proven (186M API calls/quarter), handles agent-level working + episodic memory. Saves development time. K8s deployable. |
| KG construction (incremental) | Cognee | Automated knowledge graph building from documents. Feedback loops for quality improvement. Pluggable backends. |
| KG construction (bulk) | Microsoft GraphRAG | Pipeline for converting large document collections into structured knowledge graphs. MIT licensed. For initial vertical onboarding. |
| Retrieval strategy | Tri-factor scoring + graph traversal + agent-initiated search | Layered: Generative Agents formula as baseline, AriGraph-style graph traversal for structured knowledge, MemGPT-style agent-initiated for autonomy. |
| Versioning model | Git-inspired (Letta Context Repositories pattern) | Every knowledge write is a versioned commit with agent identity, timestamp, source attribution. Branchable, mergeable, diffable. |
| Access control | Private/Shared tiers with ABAC | Collaborative Memory pattern. Private tier for agent/client-specific. Shared tier for vertical knowledge. Attribute-based policies for flexible permissions. |
| Quality gates | Verification before shared knowledge entry | Voyager self-verification pattern. New knowledge is verified/reviewed before entering the shared store. |
Knowledge Architecture Diagram
┌─────────────────────────────────────────────────────────┐
│ KAZE KNOWLEDGE SYSTEM │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ KNOWLEDGE LAYER (TypeScript) │ │
│ │ │ │
│ │ Memory Type Router (MIRIX-inspired) │ │
│ │ ├── Episodic: events, logs, history │ │
│ │ ├── Semantic: facts, graph, relationships │ │
│ │ ├── Procedural: skills, how-to, code │ │
│ │ └── Reflective: insights, learnings │ │
│ │ │ │
│ │ Retrieval Engine │ │
│ │ ├── Tri-factor scoring (recency+importance+rel.) │ │
│ │ ├── Graph traversal (knowledge graph) │ │
│ │ ├── Spreading activation (linked notes) │ │
│ │ └── Agent-initiated search (tool calls) │ │
│ │ │ │
│ │ Write Pipeline │ │
│ │ ├── Provenance tagging (AriGraph-inspired) │ │
│ │ ├── Version control (Letta-inspired) │ │
│ │ ├── Quality gate (Voyager self-verification) │ │
│ │ └── Access control (Collaborative Memory ABAC) │ │
│ │ │ │
│ │ Consolidation Engine │ │
│ │ ├── Episodic → Semantic distillation │ │
│ │ ├── Reflection synthesis │ │
│ │ ├── Contradiction detection │ │
│ │ └── Importance-based retention │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ STORAGE: PostgreSQL + pgvector + Apache AGE │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ CONSTRUCTION: Cognee (incremental) + GraphRAG (bulk)│ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ PER-AGENT MEMORY: Mem0 │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘3. Open Questions
Questions that require further exploration before implementation.
Resolved
| # | Question | Resolution |
|---|---|---|
| Q1 | What are the first verticals to build? | Resolved (D27-D31). Three verticals in parallel: Vertical 0 (Internal Ops), Vertical 1 (SEO), Vertical 2 (Toddle). Internal Ops is the primary testbed. OpenClaw as communication layer. Parallel team structure. See mvp.md. |
| Q2 | What backs the knowledge graph? | Resolved (D18-D23). PostgreSQL + pgvector + Apache AGE for Phase 1. Mem0 for per-agent memory. Cognee + GraphRAG for KG construction. Git-inspired versioning. ABAC access control. See Section 2.3 and research/knowledge-system.md. |
| Q3 | What language/stack for the platform? | Resolved (D16-D17). TypeScript primary, Python for scripting. Hybrid agent definition (YAML + TypeScript). See Section 2.1. |
| Q7 | How is vertical knowledge curated and versioned? | Resolved (D22-D23). Git-inspired versioned commits with provenance. Quality gates before shared knowledge entry. ABAC access control. See Section 2.3. |
| Q4 | What is the agent runtime contract? | Resolved (D32-D37). YAML skill/agent definitions + TypeScript SkillHandler interface. Actor-based AgentRuntime with spawn/dispatch/shutdown lifecycle. MessageTransport abstraction (DirectCallTransport MVP, NatsTransport Phase 2). See technical-design.md. |
Open
| # | Question | Context | Impact |
|---|---|---|---|
| Q5 | How does the Conversation Manager maintain cross-channel context? | Unified thread model, message deduplication, channel-specific formatting. | Medium — UX-critical |
| Q6 | What does the supervision queue UX look like? | How do Speedrun ops review, approve, and correct agent outputs efficiently? | Medium — operational efficiency |
| Q8 | What is the billing model? | Per-agent, per-task, per-token, subscription? Affects LLM Gateway design. | Medium — business model |
| Q9 | How do you handle agent-to-agent communication across cells? | Message format, discovery, auth, latency tolerance. | Low (Phase 3) — mesh feature |
| Q10 | What is the canary deployment mechanism for agent improvements? | Traffic splitting, A/B infrastructure, rollback triggers. | Medium — self-improvement safety |