Skip to content

Deployment & Infrastructure

Part of Project Kaze Architecture

Deployment Modes

Kaze supports two deployment modes from a single codebase:

Agency Model (SaaS)Customer VPC
Who hostsSpeedrun VenturesClient's cloud account
Who operatesSpeedrunSpeedrun (managed) or Client
Data residencySpeedrun's infrastructureClient's infrastructure
Network boundaryShared (multi-tenant)Isolated (single-tenant)
Trust levelClient trusts Speedrun with dataClient keeps all data in-house
Tenant isolationNamespace or vClusterFull cluster

Critical constraint: The exact same container images and IaC definitions deploy in both modes. No special SaaS version vs. on-prem version. One build, different configurations.

Cell-Based Deployment

The fundamental deployment unit is a cell — a self-contained, isolated deployment of the entire Kaze stack.

  • Each tenant (in agency mode) or each customer VPC is a cell
  • Cells are identical in structure, different in configuration
  • A cell can operate independently if disconnected from the mesh
  • Blast radius is contained — a failure in Cell 1 cannot impact Cell 2
  • Performance scaling analysis and bottleneck triggers documented in research/scalability-model.md

Cell density tiers (agency model):

TierIsolationUse case
Dedicated cellFull cluster per tenantLarge or security-sensitive clients
Shared cell, namespace isolationShared cluster, K8s namespace per tenantSmall clients, cost-optimized
Customer VPC cellFull stack in client's cloudClients requiring data sovereignty

Hybrid approach for cost optimization (agency model):

┌─────────────────────────────────────────┐
│ Shared plane (stateless)                │
│ ┌───────────┐ ┌──────────────────────┐  │
│ │ LLM       │ │ Agent Runtime Pool   │  │
│ │ Gateway   │ │ (tenant-aware)       │  │
│ └───────────┘ └──────────────────────┘  │
├─────────────────────────────────────────┤
│ Isolated plane (stateful, per-tenant)   │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Tenant A  │ │Tenant B  │ │Tenant C  │ │
│ │ DB       │ │ DB       │ │ DB       │ │
│ │ Knowledge│ │ Knowledge│ │ Knowledge│ │
│ │ Secrets  │ │ Secrets  │ │ Secrets  │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────┘

Stateless components (LLM Gateway, Agent Runtime) can be shared with tenant context passed per-request. Stateful components (databases, knowledge graphs, secrets) are always isolated per-tenant.

Mesh Network Evolution

The deployment model evolves over time:

Phase 1: Single Cell — One deployment (Speedrun's infrastructure) running the full stack. Fast path to first agents operational.

Phase 2: Multi-Cell — Multiple cells for different clients and deployment modes. Hub-and-spoke topology with Speedrun's cell as the hub.

Phase 3: Federated Mesh — Cells can discover and communicate with each other. Knowledge syncs across cells (with policy controls). Evolving toward federated coordination.

Phase 3 topology:

┌──────┐    ┌──────┐    ┌──────┐
│Cell 1│◀──▶│Cell 2│◀──▶│Cell 3│
│Agency│    │VPC-A │    │VPC-B │
└──────┘    └──────┘    └──────┘

Cross-cell capabilities:
- Vertical knowledge sync (opt-in, anonymized)
- Agent discovery across cells
- Federated monitoring
- Coordinated upgrades

True peer-to-peer mesh is explicitly deferred — the coordination complexity is not justified at early scale. Hub-and-spoke evolving toward federation is the pragmatic path.

Cloud-Agnostic Strategy

No hard dependencies on managed cloud services. Every infrastructure dependency uses either a portable open-source equivalent or a provider abstraction.

Portable component choices:

ConcernPortable ChoiceRationale
ComputeKubernetesUniversal across all clouds and on-prem
MessagingNATSLightweight, built for distributed systems, zero cloud deps
DatabasePostgreSQL (CloudNativePG)K8s-native operator, runs anywhere
Object StorageS3-compatible API (MinIO)MinIO implements S3 API, GCS has interop
SecretsHashiCorp VaultCloud-agnostic, runs as container
ObservabilityOpenTelemetry + Prometheus + Grafana + LokiFull open-source stack
Service MeshLinkerd or CiliummTLS, traffic management, zero cloud lock-in
GitOpsArgoCD or FluxContinuous delivery for K8s
IngressNginx Ingress or Envoy GatewayPortable load balancing

IaC structure:

infra/
├── terraform/
│   ├── modules/
│   │   ├── kubernetes-cluster/   # Abstract "give me a cluster"
│   │   ├── networking/           # Abstract "give me a VPC + subnets"
│   │   └── storage/              # Abstract "give me a bucket"
│   └── providers/
│       ├── aws/                  # AWS-specific implementations
│       ├── gcp/                  # GCP-specific implementations
│       ├── azure/                # Azure-specific implementations
│       └── bare-metal/           # For on-prem
├── kubernetes/
│   ├── base/                     # The platform (cloud-agnostic)
│   └── overlays/
│       ├── agency-aws/
│       ├── agency-gcp/
│       ├── customer-vpc-aws/
│       └── customer-vpc-azure/

Strategy: Terraform/OpenTofu handles cloud-specific provisioning (one-time per environment). Kubernetes handles the application layer (universal). Once a cluster exists, everything above it is identical regardless of cloud.

Pragmatic approach: Build and validate on one cloud first (likely AWS). Ensure the architecture allows portability (containers, IaC, no proprietary services) but defer proving portability until a client requires it.