Deployment & Infrastructure

Part of Project Kaze Architecture

Deployment Modes

Kaze supports two deployment modes from a single codebase:

	Agency Model (SaaS)	Customer VPC
Who hosts	Speedrun Ventures	Client's cloud account
Who operates	Speedrun	Speedrun (managed) or Client
Data residency	Speedrun's infrastructure	Client's infrastructure
Network boundary	Shared (multi-tenant)	Isolated (single-tenant)
Trust level	Client trusts Speedrun with data	Client keeps all data in-house
Tenant isolation	Namespace or vCluster	Full cluster

Critical constraint: The exact same container images and IaC definitions deploy in both modes. No special SaaS version vs. on-prem version. One build, different configurations.

Cell-Based Deployment

The fundamental deployment unit is a cell — a self-contained, isolated deployment of the entire Kaze stack.

Each tenant (in agency mode) or each customer VPC is a cell
Cells are identical in structure, different in configuration
A cell can operate independently if disconnected from the mesh
Blast radius is contained — a failure in Cell 1 cannot impact Cell 2
Performance scaling analysis and bottleneck triggers documented in research/scalability-model.md

Cell density tiers (agency model):

Tier	Isolation	Use case
Dedicated cell	Full cluster per tenant	Large or security-sensitive clients
Shared cell, namespace isolation	Shared cluster, K8s namespace per tenant	Small clients, cost-optimized
Customer VPC cell	Full stack in client's cloud	Clients requiring data sovereignty

Hybrid approach for cost optimization (agency model):

┌─────────────────────────────────────────┐
│ Shared plane (stateless)                │
│ ┌───────────┐ ┌──────────────────────┐  │
│ │ LLM       │ │ Agent Runtime Pool   │  │
│ │ Gateway   │ │ (tenant-aware)       │  │
│ └───────────┘ └──────────────────────┘  │
├─────────────────────────────────────────┤
│ Isolated plane (stateful, per-tenant)   │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Tenant A  │ │Tenant B  │ │Tenant C  │ │
│ │ DB       │ │ DB       │ │ DB       │ │
│ │ Knowledge│ │ Knowledge│ │ Knowledge│ │
│ │ Secrets  │ │ Secrets  │ │ Secrets  │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────┘

Stateless components (LLM Gateway, Agent Runtime) can be shared with tenant context passed per-request. Stateful components (databases, knowledge graphs, secrets) are always isolated per-tenant.

Mesh Network Evolution

The deployment model evolves over time:

Phase 1: Single Cell — One deployment (Speedrun's infrastructure) running the full stack. Fast path to first agents operational.

Phase 2: Multi-Cell — Multiple cells for different clients and deployment modes. Hub-and-spoke topology with Speedrun's cell as the hub.

Phase 3: Federated Mesh — Cells can discover and communicate with each other. Knowledge syncs across cells (with policy controls). Evolving toward federated coordination.

Phase 3 topology:

┌──────┐    ┌──────┐    ┌──────┐
│Cell 1│◀──▶│Cell 2│◀──▶│Cell 3│
│Agency│    │VPC-A │    │VPC-B │
└──────┘    └──────┘    └──────┘

Cross-cell capabilities:
- Vertical knowledge sync (opt-in, anonymized)
- Agent discovery across cells
- Federated monitoring
- Coordinated upgrades

True peer-to-peer mesh is explicitly deferred — the coordination complexity is not justified at early scale. Hub-and-spoke evolving toward federation is the pragmatic path.

Cloud-Agnostic Strategy

No hard dependencies on managed cloud services. Every infrastructure dependency uses either a portable open-source equivalent or a provider abstraction.

Portable component choices:

Concern	Portable Choice	Rationale
Compute	Kubernetes	Universal across all clouds and on-prem
Messaging	NATS	Lightweight, built for distributed systems, zero cloud deps
Database	PostgreSQL (CloudNativePG)	K8s-native operator, runs anywhere
Object Storage	S3-compatible API (MinIO)	MinIO implements S3 API, GCS has interop
Secrets	HashiCorp Vault	Cloud-agnostic, runs as container
Observability	OpenTelemetry + Prometheus + Grafana + Loki	Full open-source stack
Service Mesh	Linkerd or Cilium	mTLS, traffic management, zero cloud lock-in
GitOps	ArgoCD or Flux	Continuous delivery for K8s
Ingress	Nginx Ingress or Envoy Gateway	Portable load balancing

IaC structure:

infra/
├── terraform/
│   ├── modules/
│   │   ├── kubernetes-cluster/   # Abstract "give me a cluster"
│   │   ├── networking/           # Abstract "give me a VPC + subnets"
│   │   └── storage/              # Abstract "give me a bucket"
│   └── providers/
│       ├── aws/                  # AWS-specific implementations
│       ├── gcp/                  # GCP-specific implementations
│       ├── azure/                # Azure-specific implementations
│       └── bare-metal/           # For on-prem
├── kubernetes/
│   ├── base/                     # The platform (cloud-agnostic)
│   └── overlays/
│       ├── agency-aws/
│       ├── agency-gcp/
│       ├── customer-vpc-aws/
│       └── customer-vpc-azure/

Strategy: Terraform/OpenTofu handles cloud-specific provisioning (one-time per environment). Kubernetes handles the application layer (universal). Once a cluster exists, everything above it is identical regardless of cloud.

Pragmatic approach: Build and validate on one cloud first (likely AWS). Ensure the architecture allows portability (containers, IaC, no proprietary services) but defer proving portability until a client requires it.

Deployment & Infrastructure ​

Deployment Modes ​

Cell-Based Deployment ​

Mesh Network Evolution ​

Cloud-Agnostic Strategy ​

Deployment & Infrastructure

Deployment Modes

Cell-Based Deployment

Mesh Network Evolution

Cloud-Agnostic Strategy