Deployment & Infrastructure
Part of Project Kaze Architecture
Deployment Modes
Kaze supports two deployment modes from a single codebase:
| Agency Model (SaaS) | Customer VPC | |
|---|---|---|
| Who hosts | Speedrun Ventures | Client's cloud account |
| Who operates | Speedrun | Speedrun (managed) or Client |
| Data residency | Speedrun's infrastructure | Client's infrastructure |
| Network boundary | Shared (multi-tenant) | Isolated (single-tenant) |
| Trust level | Client trusts Speedrun with data | Client keeps all data in-house |
| Tenant isolation | Namespace or vCluster | Full cluster |
Critical constraint: The exact same container images and IaC definitions deploy in both modes. No special SaaS version vs. on-prem version. One build, different configurations.
Cell-Based Deployment
The fundamental deployment unit is a cell — a self-contained, isolated deployment of the entire Kaze stack.
- Each tenant (in agency mode) or each customer VPC is a cell
- Cells are identical in structure, different in configuration
- A cell can operate independently if disconnected from the mesh
- Blast radius is contained — a failure in Cell 1 cannot impact Cell 2
- Performance scaling analysis and bottleneck triggers documented in research/scalability-model.md
Cell density tiers (agency model):
| Tier | Isolation | Use case |
|---|---|---|
| Dedicated cell | Full cluster per tenant | Large or security-sensitive clients |
| Shared cell, namespace isolation | Shared cluster, K8s namespace per tenant | Small clients, cost-optimized |
| Customer VPC cell | Full stack in client's cloud | Clients requiring data sovereignty |
Hybrid approach for cost optimization (agency model):
┌─────────────────────────────────────────┐
│ Shared plane (stateless) │
│ ┌───────────┐ ┌──────────────────────┐ │
│ │ LLM │ │ Agent Runtime Pool │ │
│ │ Gateway │ │ (tenant-aware) │ │
│ └───────────┘ └──────────────────────┘ │
├─────────────────────────────────────────┤
│ Isolated plane (stateful, per-tenant) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Tenant A │ │Tenant B │ │Tenant C │ │
│ │ DB │ │ DB │ │ DB │ │
│ │ Knowledge│ │ Knowledge│ │ Knowledge│ │
│ │ Secrets │ │ Secrets │ │ Secrets │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────┘Stateless components (LLM Gateway, Agent Runtime) can be shared with tenant context passed per-request. Stateful components (databases, knowledge graphs, secrets) are always isolated per-tenant.
Mesh Network Evolution
The deployment model evolves over time:
Phase 1: Single Cell — One deployment (Speedrun's infrastructure) running the full stack. Fast path to first agents operational.
Phase 2: Multi-Cell — Multiple cells for different clients and deployment modes. Hub-and-spoke topology with Speedrun's cell as the hub.
Phase 3: Federated Mesh — Cells can discover and communicate with each other. Knowledge syncs across cells (with policy controls). Evolving toward federated coordination.
Phase 3 topology:
┌──────┐ ┌──────┐ ┌──────┐
│Cell 1│◀──▶│Cell 2│◀──▶│Cell 3│
│Agency│ │VPC-A │ │VPC-B │
└──────┘ └──────┘ └──────┘
Cross-cell capabilities:
- Vertical knowledge sync (opt-in, anonymized)
- Agent discovery across cells
- Federated monitoring
- Coordinated upgradesTrue peer-to-peer mesh is explicitly deferred — the coordination complexity is not justified at early scale. Hub-and-spoke evolving toward federation is the pragmatic path.
Cloud-Agnostic Strategy
No hard dependencies on managed cloud services. Every infrastructure dependency uses either a portable open-source equivalent or a provider abstraction.
Portable component choices:
| Concern | Portable Choice | Rationale |
|---|---|---|
| Compute | Kubernetes | Universal across all clouds and on-prem |
| Messaging | NATS | Lightweight, built for distributed systems, zero cloud deps |
| Database | PostgreSQL (CloudNativePG) | K8s-native operator, runs anywhere |
| Object Storage | S3-compatible API (MinIO) | MinIO implements S3 API, GCS has interop |
| Secrets | HashiCorp Vault | Cloud-agnostic, runs as container |
| Observability | OpenTelemetry + Prometheus + Grafana + Loki | Full open-source stack |
| Service Mesh | Linkerd or Cilium | mTLS, traffic management, zero cloud lock-in |
| GitOps | ArgoCD or Flux | Continuous delivery for K8s |
| Ingress | Nginx Ingress or Envoy Gateway | Portable load balancing |
IaC structure:
infra/
├── terraform/
│ ├── modules/
│ │ ├── kubernetes-cluster/ # Abstract "give me a cluster"
│ │ ├── networking/ # Abstract "give me a VPC + subnets"
│ │ └── storage/ # Abstract "give me a bucket"
│ └── providers/
│ ├── aws/ # AWS-specific implementations
│ ├── gcp/ # GCP-specific implementations
│ ├── azure/ # Azure-specific implementations
│ └── bare-metal/ # For on-prem
├── kubernetes/
│ ├── base/ # The platform (cloud-agnostic)
│ └── overlays/
│ ├── agency-aws/
│ ├── agency-gcp/
│ ├── customer-vpc-aws/
│ └── customer-vpc-azure/Strategy: Terraform/OpenTofu handles cloud-specific provisioning (one-time per environment). Kubernetes handles the application layer (universal). Once a cluster exists, everything above it is identical regardless of cloud.
Pragmatic approach: Build and validate on one cloud first (likely AWS). Ensure the architecture allows portability (containers, IaC, no proprietary services) but defer proving portability until a client requires it.