Design Decisions & Technology Selection

Part of Project Kaze Architecture

1. Design Decisions Log

Record of architectural decisions made, with rationale.

#	Decision	Choice	Rationale	Alternatives considered
D1	Initial deployment model	Self-hosted, evolving to mesh	Fastest path to first agents operational. Mesh adds value only with multiple cells.	Start with mesh (too complex), SaaS-only (limits VPC clients)
D2	Cloud strategy	Cloud-agnostic from day one	SME clients span multiple clouds. Customer VPC must work anywhere.	AWS-first with portability later (risks lock-in in architecture)
D3	Containerization	Everything containerized, K8s as universal runtime	K8s is the only compute abstraction that works across all clouds and on-prem.	Docker Compose (too simple), Nomad (less ecosystem)
D4	IaC approach	Terraform/OpenTofu (infra) + K8s manifests with Kustomize (app)	Terraform handles cloud-specific provisioning; K8s handles universal app layer.	Pulumi (smaller ecosystem), CDK (AWS-locked)
D5	Deployment modes	Agency (multi-tenant SaaS) + Customer VPC (single-tenant)	Must support both for SME market — some trust Speedrun with data, others require sovereignty.	SaaS-only (loses enterprise clients), VPC-only (too expensive to operate)
D6	LLM provider strategy	Multi-provider, abstracted behind LLM Gateway	Clients have varying provider credits and preferences. Provider competition benefits pricing.	Single provider (simpler but limiting)
D7	LLM key management	Dual-key model (Speedrun keys + client keys)	Clients may have their own discounts/credits. Speedrun keys provide baseline.	Speedrun-only keys (limits flexibility), client-only keys (no fallback)
D8	Inter-agent messaging	NATS	Lightweight, portable, supports pub/sub + request/reply + streaming. Zero cloud deps.	Kafka (heavier, better for event sourcing), RabbitMQ (less cloud-native)
D9	Observability in customer VPC	Full stack in VPC + VPN-in for ops	Data stays in client boundary. Health beacon out for alerting. VPN for investigation.	Centralized telemetry (privacy concern), client self-monitors (support burden)
D10	Air-gapped deployments	Out of scope for now	Adds significant complexity (local models, offline operation). No immediate client demand.	Support from day one (too complex)
D11	Architecture pattern	Agent-Oriented (Actor + EDA + Cell hybrid)	Agents are intelligent, autonomous entities that don't fit cleanly into any single traditional pattern.	Pure microservices (agents aren't services), pure actor model (missing knowledge sharing)
D12	AI-native operating model	AI as core operator, humans as governors	Differentiator from every other agent platform. Enables scaling without linear human headcount.	AI as tool (standard SaaS approach, no moat)
D13	Vertical strategy	Deep vertical-first, not horizontal platform	Creates compounding knowledge moat. Easier to prove quality in well-understood domains.	Horizontal platform (spread thin, no depth)
D14	Supervision model	Per-skill ramp: supervised → sampling → autonomous	Granular trust building. A single agent can be autonomous at one skill and supervised at another.	Per-agent binary (too coarse), full autonomous from start (too risky)
D15	Human interaction model	Multi-channel (Slack, Email, WhatsApp, Telegram)	Meet SME clients where they already work. Minimize adoption friction.	Dashboard-only (adoption barrier), single channel (too limiting)
D16	Primary language	TypeScript	Team strength, LLM SDK ecosystem, full-stack capability
D17	Agent definition model	Hybrid (YAML + TypeScript)	YAML for structure/config, TypeScript for custom logic. Balances ease of creation with expressiveness.
D18	Knowledge storage (Phase 1)	PostgreSQL + pgvector + Apache AGE	One database, already in stack, relational + vector + graph. Minimum ops.
D19	Per-agent memory	Mem0	Production-proven, saves dev time, K8s deployable. Evaluate custom replacement in Phase 2.
D20	KG construction pipeline	Cognee (incremental) + GraphRAG (bulk)	Cognee for ongoing updates with feedback loops. GraphRAG for initial vertical onboarding from documents.
D21	Retrieval strategy	Tri-factor + graph traversal + agent-initiated	Layered approach: baseline ranking + structured traversal + agent autonomy.
D22	Knowledge versioning	Git-inspired commits with provenance	Every write is versioned with agent identity, timestamp, source. Letta pattern.
D23	Knowledge access control	Private/Shared tiers with ABAC	Client isolation + vertical sharing. Collaborative Memory pattern.
D24	IaC tooling	OpenTofu + Kustomize	Open source, cloud-agnostic provisioning + overlay-based K8s configuration.
D25	Container registry	GitHub Container Registry	Integrated with GitHub Actions workflow.
D26	Testing	Jest	Team familiarity, mature TypeScript testing ecosystem.
D27	First vertical (testbed)	Internal Ops (Vertical 0)	Dogfooding — fastest feedback loop, no external coordination. Exercises every platform component before external verticals consume it.
D28	Communication layer	OpenClaw	Mature conversation management and tool routing. Build agent/memory design on top rather than reinventing the conversation interface.
D29	MVP messaging	Direct calls (no NATS)	NATS adds operational complexity. Start with direct function calls between agents. Migrate to NATS when scale demands it.
D30	MVP knowledge store	Mem0 + pgvector only (defer Apache AGE)	Full graph database is not needed for MVP. Vector search + per-agent memory covers initial needs. Add graph when cross-vertical knowledge sharing becomes a real requirement.
D31	Build structure	Parallel teams: Lead (Foundation + V0), Team 2 (Toddle), Team 3 (SEO)	Different people own different verticals. Foundation lead also dogfoods as V0 owner, surfacing issues before external teams hit them.
D32	Skill definition format	YAML schema (`*.skill.yaml`) + optional TypeScript handler	YAML for declarative structure (inputs, outputs, tools, quality criteria, supervision ramp). TS for custom logic. Parsed into `SkillDefinition` at load time.
D33	Agent runtime engine	`AgentRuntime.spawn()` / `dispatch()` / `shutdown()`	Actor-based lifecycle. Agents are long-lived, stateful, process one task at a time. TaskRequest → TaskHandle → TaskResult.
D34	Inter-agent messaging	`MessageTransport` interface with `AgentMessage` envelope	DirectCallTransport for MVP (in-process Map routing). Same message shape serializes to NATS in Phase 2. Zero agent code changes on migration.
D35	LLM model hints	`fast / balanced / best / embed / judge` → per-tenant model mapping	Agents request quality level, gateway resolves to concrete model. Decouples agent logic from model selection.
D36	Budget enforcement	Pre-request estimate + hard stop + post-request actual update	`FOR UPDATE SKIP LOCKED` for distributed safety. Deterministic code check, not AI reasoning.
D37	Observation logging	Fire-and-forget with batched writes (100 events or 1s flush)	18 structured event types. Automatic middleware on LLM Gateway, Tool Executor, Knowledge Client. Never blocks agent execution.
D38	Agent privilege model	Capability manifests (whitelist per agent)	Agents declare tools + knowledge domains + communication targets. Runtime enforces — agent cannot discover or invoke anything outside its manifest.	Open access with per-vertical filtering (too permissive)
D39	Prompt injection defense	Instruction hierarchy + output validation	System > skill > knowledge > user > tool output. Sensitive operations require deterministic validation, not just agent reasoning. No complete solution exists — defense in depth.	Input sanitization only (insufficient), fine-tuned classifiers (premature)
D40	Shared knowledge integrity	Quarantine + multi-signal quality gate	Writes to shared tier enter quarantine before visibility. Quality gate uses LLM-as-judge + cross-reference + source verification. Not LLM-only.	Immediate publish (risky), human review for all (doesn't scale)
D41	LLM data classification	Per-entry tags controlling LLM exposure	Knowledge entries tagged "safe for LLM" or "internal only." Gateway respects tags before including in prompts. Sensitive data routed to zero-retention providers or local models.	No classification (sends everything), blanket redaction (loses utility)
D42	Credential lifecycle	Automated rotation + anomaly detection + blast radius containment	Vault-managed rotation on schedule. Usage spikes trigger alert + auto-freeze. One compromised key affects only that client's agents.	Manual rotation (error-prone), no anomaly detection (slow response)
D43	Knowledge provenance & data rights	Tiered consent model with provenance classification	Default: strict isolation (no client data in shared knowledge). Opt-in contributor tier with consent addendum. Every entry tagged with source class (`public`, `speedrun_internal`, `speedrun_research`, `client_contributed`, `client_private`). ABAC enforces visibility. Protects against trade secret claims and GDPR purpose limitation. See research/data-rights-knowledge-sharing.md.	Strict isolation only (kills flywheel), consent + anonymization only (re-identification risk), aggregate-only (shallow knowledge)
D44	Scaling strategy	Metric-triggered scaling over pre-optimization	No pre-sharding, no NATS at MVP, no Qdrant at MVP. Each scaling action has a concrete metric trigger (e.g., pgvector p95 >200ms → add Qdrant). Scale vertically first, horizontally when vertical limit hit. Agent code never changes when infrastructure scales. See research/scalability-model.md.	Pre-optimized architecture (complexity too early), manual scaling decisions (reactive, error-prone)
D45	Cost optimization strategy	BYOK-first + model selection optimization + tiered pricing	Client BYOK as default (drops Speedrun's LLM cost to ~$0). Model selection routing (cheapest model meeting quality bar, 40-50% savings). Tiered subscription pricing (Starter/Growth/Enterprise). Prompt caching for 90% discount on repeated context. Batch API for background tasks. See research/cost-model.md.	Flat per-task pricing (too complex for SMEs), Speedrun-key-only (thin margins), single model tier (wasteful)
D46	Service topology	Three separate repos/services: gateway, runtime, knowledge	Secret isolation (gateway holds LLM+tool keys, knowledge holds own LLM key, runtime holds zero secrets). Independent scaling and deployment.	Monorepo (simpler but secrets leak across concerns), gateway+knowledge merged (couples LLM routing with memory)
D47	LLM SDK	Vercel AI SDK (`ai` package)	Unified interface for Gemini + Claude. Built-in tool-use loop, streaming, structured output. Actively maintained.	Direct provider SDKs (more boilerplate), LangChain (heavier than needed for gateway)
D48	Observability (MVP)	Langfuse (hosted SaaS)	LLM trace visualization, cost tracking, prompt management. Free tier sufficient for MVP. Integrated in gateway via Vercel AI SDK.	Custom Observation Logger (more work), no observability (blind)
D49	Knowledge service LLM key	Dedicated key, not shared with gateway	Knowledge service needs LLM for Mem0 fact extraction + Google embeddings. Own key avoids coupling to gateway for secrets.	Share gateway key via env (coupling), route through gateway (unnecessary latency)
D50	Embedding model	Google `gemini-embedding-001` (768 dims)	Free tier available, good quality, native Mem0 support.	OpenAI ada-002 (paid only), Cohere (less Mem0 integration)
D51	MVP vector store	PostgreSQL + pgvector via LangChain adapter	Matches D18 (PostgreSQL for knowledge storage). Mem0 connects via LangChain PGVectorStore wrapper. Single database for vectors + future shared knowledge.	Qdrant (separate infra), SQLite (not production-grade), Mem0 built-in memory store (no persistence guarantees)
D52	CI/CD (MVP)	GitHub Actions + Tailscale SSH deploy to EC2	Simple, no K8s overhead for MVP. Tailscale provides secure private networking. Docker containers on EC2.	K8s (too complex for 3 services), manual deploy (error-prone)

2. Technology Selection

2.1 Core Stack

Concern	Choice	Rationale
Primary language	TypeScript	Team strength, rich LLM SDK ecosystem, full-stack (platform + dashboard + agents in one language)
Secondary language	Python (scripting)	Some developers prefer it for scripting tasks, strong AI/data library ecosystem
Agent runtime model	Hybrid (YAML structure + TypeScript code)	YAML defines agent structure (skills, knowledge deps, config). TypeScript for custom logic. Balances configurability with flexibility.
Package manager	To be determined	—
Testing	Jest	Team familiarity, mature ecosystem

2.2 Infrastructure & DevOps

Concern	Choice	Rationale
Container orchestration	Kubernetes	Universal compute abstraction, cloud-agnostic (decided in architecture)
IaC (infrastructure)	OpenTofu	Open source Terraform fork, cloud-agnostic provisioning
IaC (application)	Kustomize	Base + overlay model for environment-specific configs without templating complexity
GitOps	ArgoCD or Flux	Continuous delivery to K8s, enables remote upgrades of customer VPC deployments
Container registry	GitHub Container Registry	Integrated with GitHub workflows, no additional infrastructure
CI/CD	GitHub Actions	Integrated with GHCR and GitOps pipeline

2.3 Knowledge System

Detailed research documented in research/knowledge-system.md.

Storage Layer (Phased)

Phase	Choice	Rationale
Phase 1 (MVP)	PostgreSQL + pgvector + Apache AGE	One database for relational + vector + graph. Already in our stack. Minimum ops burden. Good enough for early-to-mid scale.
Phase 2 (if vector bottleneck)	Add Qdrant	Best performance for pure vector search. Complements Postgres for hot-path queries.
Phase 3 (if graph bottleneck)	Add FalkorDB or evaluate SurrealDB	FalkorDB for dedicated graph performance. SurrealDB as potential single-system consolidation.

Knowledge System Components

Component	Choice	Rationale
Memory types	Episodic + Semantic + Procedural + Reflective	Aligned with CoALA framework and MIRIX taxonomy. Proven across academic literature.
Per-agent memory	Mem0	Production-proven (186M API calls/quarter), handles agent-level working + episodic memory. Saves development time. K8s deployable.
KG construction (incremental)	Cognee	Automated knowledge graph building from documents. Feedback loops for quality improvement. Pluggable backends.
KG construction (bulk)	Microsoft GraphRAG	Pipeline for converting large document collections into structured knowledge graphs. MIT licensed. For initial vertical onboarding.
Retrieval strategy	Tri-factor scoring + graph traversal + agent-initiated search	Layered: Generative Agents formula as baseline, AriGraph-style graph traversal for structured knowledge, MemGPT-style agent-initiated for autonomy.
Versioning model	Git-inspired (Letta Context Repositories pattern)	Every knowledge write is a versioned commit with agent identity, timestamp, source attribution. Branchable, mergeable, diffable.
Access control	Private/Shared tiers with ABAC	Collaborative Memory pattern. Private tier for agent/client-specific. Shared tier for vertical knowledge. Attribute-based policies for flexible permissions.
Quality gates	Verification before shared knowledge entry	Voyager self-verification pattern. New knowledge is verified/reviewed before entering the shared store.

Knowledge Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│              KAZE KNOWLEDGE SYSTEM                        │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │          KNOWLEDGE LAYER (TypeScript)                │ │
│  │                                                      │ │
│  │  Memory Type Router (MIRIX-inspired)                 │ │
│  │    ├── Episodic: events, logs, history               │ │
│  │    ├── Semantic: facts, graph, relationships         │ │
│  │    ├── Procedural: skills, how-to, code              │ │
│  │    └── Reflective: insights, learnings               │ │
│  │                                                      │ │
│  │  Retrieval Engine                                    │ │
│  │    ├── Tri-factor scoring (recency+importance+rel.)  │ │
│  │    ├── Graph traversal (knowledge graph)             │ │
│  │    ├── Spreading activation (linked notes)           │ │
│  │    └── Agent-initiated search (tool calls)           │ │
│  │                                                      │ │
│  │  Write Pipeline                                      │ │
│  │    ├── Provenance tagging (AriGraph-inspired)        │ │
│  │    ├── Version control (Letta-inspired)              │ │
│  │    ├── Quality gate (Voyager self-verification)      │ │
│  │    └── Access control (Collaborative Memory ABAC)    │ │
│  │                                                      │ │
│  │  Consolidation Engine                                │ │
│  │    ├── Episodic → Semantic distillation              │ │
│  │    ├── Reflection synthesis                          │ │
│  │    ├── Contradiction detection                       │ │
│  │    └── Importance-based retention                    │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  STORAGE: PostgreSQL + pgvector + Apache AGE         │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  CONSTRUCTION: Cognee (incremental) + GraphRAG (bulk)│ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  PER-AGENT MEMORY: Mem0                              │ │
│  └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

3. Open Questions

Questions that require further exploration before implementation.

Resolved

#	Question	Resolution
Q1	What are the first verticals to build?	Resolved (D27-D31). Three verticals in parallel: Vertical 0 (Internal Ops), Vertical 1 (SEO), Vertical 2 (Toddle). Internal Ops is the primary testbed. OpenClaw as communication layer. Parallel team structure. See mvp.md.
Q2	What backs the knowledge graph?	Resolved (D18-D23). PostgreSQL + pgvector + Apache AGE for Phase 1. Mem0 for per-agent memory. Cognee + GraphRAG for KG construction. Git-inspired versioning. ABAC access control. See Section 2.3 and research/knowledge-system.md.
Q3	What language/stack for the platform?	Resolved (D16-D17). TypeScript primary, Python for scripting. Hybrid agent definition (YAML + TypeScript). See Section 2.1.
Q7	How is vertical knowledge curated and versioned?	Resolved (D22-D23). Git-inspired versioned commits with provenance. Quality gates before shared knowledge entry. ABAC access control. See Section 2.3.
Q4	What is the agent runtime contract?	Resolved (D32-D37). YAML skill/agent definitions + TypeScript SkillHandler interface. Actor-based AgentRuntime with spawn/dispatch/shutdown lifecycle. MessageTransport abstraction (DirectCallTransport MVP, NatsTransport Phase 2). See technical-design.md.

Open

#	Question	Context	Impact
Q5	How does the Conversation Manager maintain cross-channel context?	Unified thread model, message deduplication, channel-specific formatting.	Medium — UX-critical
Q6	What does the supervision queue UX look like?	How do Speedrun ops review, approve, and correct agent outputs efficiently?	Medium — operational efficiency
Q8	What is the billing model?	Per-agent, per-task, per-token, subscription? Affects LLM Gateway design.	Medium — business model
Q9	How do you handle agent-to-agent communication across cells?	Message format, discovery, auth, latency tolerance.	Low (Phase 3) — mesh feature
Q10	What is the canary deployment mechanism for agent improvements?	Traffic splitting, A/B infrastructure, rollback triggers.	Medium — self-improvement safety

Design Decisions & Technology Selection ​

1. Design Decisions Log ​

2. Technology Selection ​

2.1 Core Stack ​

2.2 Infrastructure & DevOps ​

2.3 Knowledge System ​

Storage Layer (Phased) ​

Knowledge System Components ​

Knowledge Architecture Diagram ​

3. Open Questions ​

Resolved ​

Open ​

Design Decisions & Technology Selection

1. Design Decisions Log

2. Technology Selection

2.1 Core Stack

2.2 Infrastructure & DevOps

2.3 Knowledge System

Storage Layer (Phased)

Knowledge System Components

Knowledge Architecture Diagram

3. Open Questions

Resolved

Open