Skip to content

Infrastructure & Deployment

Part of Project Kaze Architecture


1. Deployment Modes

Kaze supports two deployment modes from a single codebase:

Agency Model (SaaS)Customer VPC
Who hostsSpeedrun VenturesClient's cloud account
Who operatesSpeedrunSpeedrun (managed) or Client
Data residencySpeedrun's infrastructureClient's infrastructure
Network boundaryShared (multi-tenant)Isolated (single-tenant)
Trust levelClient trusts Speedrun with dataClient keeps all data in-house
Tenant isolationNamespace or vClusterFull cluster

Critical constraint: The exact same container images and IaC definitions deploy in both modes. No special SaaS version vs. on-prem version. One build, different configurations.

2. Cell-Based Deployment

The fundamental deployment unit is a cell — a self-contained, isolated deployment of the entire Kaze stack.

  • Each tenant (in agency mode) or each customer VPC is a cell
  • Cells are identical in structure, different in configuration
  • A cell can operate independently if disconnected from the mesh
  • Blast radius is contained — a failure in Cell 1 cannot impact Cell 2
  • Performance scaling analysis and bottleneck triggers documented in research/scalability-model.md

Cell density tiers (agency model):

TierIsolationUse case
Dedicated cellFull cluster per tenantLarge or security-sensitive clients
Shared cell, namespace isolationShared cluster, K8s namespace per tenantSmall clients, cost-optimized
Customer VPC cellFull stack in client's cloudClients requiring data sovereignty

Hybrid approach for cost optimization (agency model):

┌─────────────────────────────────────────┐
│ Shared plane (stateless)                │
│ ┌───────────┐ ┌──────────────────────┐  │
│ │ LLM       │ │ Agent Runtime Pool   │  │
│ │ Gateway   │ │ (tenant-aware)       │  │
│ └───────────┘ └──────────────────────┘  │
├─────────────────────────────────────────┤
│ Isolated plane (stateful, per-tenant)   │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Tenant A  │ │Tenant B  │ │Tenant C  │ │
│ │ DB       │ │ DB       │ │ DB       │ │
│ │ Knowledge│ │ Knowledge│ │ Knowledge│ │
│ │ Secrets  │ │ Secrets  │ │ Secrets  │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────┘

Stateless components (LLM Gateway, Agent Runtime) can be shared with tenant context passed per-request. Stateful components (databases, knowledge graphs, secrets) are always isolated per-tenant.

3. Mesh Network Evolution

The deployment model evolves over time:

Phase 1: Single Cell — One deployment (Speedrun's infrastructure) running the full stack. Fast path to first agents operational.

Phase 2: Multi-Cell — Multiple cells for different clients and deployment modes. Hub-and-spoke topology with Speedrun's cell as the hub.

Phase 3: Federated Mesh — Cells can discover and communicate with each other. Knowledge syncs across cells (with policy controls). Evolving toward federated coordination.

Phase 3 topology:

┌──────┐    ┌──────┐    ┌──────┐
│Cell 1│◀──▶│Cell 2│◀──▶│Cell 3│
│Agency│    │VPC-A │    │VPC-B │
└──────┘    └──────┘    └──────┘

Cross-cell capabilities:
- Vertical knowledge sync (opt-in, anonymized)
- Agent discovery across cells
- Federated monitoring
- Coordinated upgrades

True peer-to-peer mesh is explicitly deferred — the coordination complexity is not justified at early scale. Hub-and-spoke evolving toward federation is the pragmatic path.

4. Cloud-Agnostic Strategy

No hard dependencies on managed cloud services. Every infrastructure dependency uses either a portable open-source equivalent or a provider abstraction.

Portable component choices:

ConcernPortable ChoiceRationale
ComputeKubernetesUniversal across all clouds and on-prem
MessagingNATSLightweight, built for distributed systems, zero cloud deps
DatabasePostgreSQL (CloudNativePG)K8s-native operator, runs anywhere
Object StorageS3-compatible API (MinIO)MinIO implements S3 API, GCS has interop
SecretsHashiCorp VaultCloud-agnostic, runs as container
ObservabilityOpenTelemetry + Prometheus + Grafana + LokiFull open-source stack
Service MeshLinkerd or CiliummTLS, traffic management, zero cloud lock-in
GitOpsArgoCD or FluxContinuous delivery for K8s
IngressNginx Ingress or Envoy GatewayPortable load balancing

IaC structure:

infra/
├── terraform/
│   ├── modules/
│   │   ├── kubernetes-cluster/   # Abstract "give me a cluster"
│   │   ├── networking/           # Abstract "give me a VPC + subnets"
│   │   └── storage/              # Abstract "give me a bucket"
│   └── providers/
│       ├── aws/                  # AWS-specific implementations
│       ├── gcp/                  # GCP-specific implementations
│       ├── azure/                # Azure-specific implementations
│       └── bare-metal/           # For on-prem
├── kubernetes/
│   ├── base/                     # The platform (cloud-agnostic)
│   └── overlays/
│       ├── agency-aws/
│       ├── agency-gcp/
│       ├── customer-vpc-aws/
│       └── customer-vpc-azure/

Strategy: Terraform/OpenTofu handles cloud-specific provisioning (one-time per environment). Kubernetes handles the application layer (universal). Once a cluster exists, everything above it is identical regardless of cloud.

Pragmatic approach: Build and validate on one cloud first (likely AWS). Ensure the architecture allows portability (containers, IaC, no proprietary services) but defer proving portability until a client requires it.

5. LLM Provider & Key Management

Dual-key model:

  • Speedrun keys — Speedrun's own API keys across multiple LLM providers, centrally managed and monitored.
  • Client keys — Clients bring their own keys (e.g., Azure OpenAI credits, Anthropic volume discount, Google Cloud credits).

Key routing logic:

  • Agent X for Client A → use Client A's Anthropic key
  • Agent Y for Client A → use Speedrun's OpenAI key (client has no OpenAI credits)
  • Agent Z for Client B → use Client B's Azure OpenAI endpoint
  • Fallback: if Client A's key hits rate limit → fall back to Speedrun's key (if policy allows)

Routing is configured per tenant + agent + provider, not hardcoded.

Key storage & security:

Vault paths:
  speedrun/
    ├── anthropic-key-1
    ├── openai-key-1
    └── google-key-1
  clients/
    ├── client-a/
    │   ├── anthropic-key
    │   └── azure-openai
    └── client-b/
        └── anthropic-key

Security rules:

  • Client keys are encrypted at rest and access-scoped — only agents running for that client can access their keys
  • In customer VPC mode, client keys never leave their VPC
  • In agency mode, client keys are stored in Speedrun's Vault with strict tenant-scoped access policies
  • Every key usage is logged with full attribution — clients can see exactly which agent used their key, when, and token count

6. Security Architecture

Network Security

  • Zero-trust networking between all components. mTLS everywhere, even inside the cluster.
  • In agency mode: strict Kubernetes network policies — Tenant A's agents can never reach Tenant B's resources.
  • In customer VPC mode: clearly defined ingress/egress rules.

Secrets Management

  • No reliance on a single secrets provider. Vault is primary, with the ability to integrate with cloud-native secret managers where needed.
  • Agent credentials (API keys to client systems) never leave the deployment boundary.
  • In customer VPC mode, Speedrun operators have no access to client secrets.

Audit & Compliance

  • Every agent action is logged with full attribution.
  • Immutable audit logs that the client can export and own.
  • Required for SMEs in regulated industries (finance, healthcare).

Supply Chain Security

  • Signed container images.
  • SBOM (Software Bill of Materials) for customer VPC deployments.
  • Reproducible builds so customers can verify what's running in their VPC.

Identity & Trust

  • Agent-to-agent authentication via capability-based tokens.
  • Cross-cell communication secured via mTLS with signed agent manifests.
  • Compromised node containment — a single cell breach cannot propagate to others.

7. Observability in Customer VPC

The full monitoring stack deploys inside every customer VPC as part of the Kaze stack:

Customer VPC                          Speedrun Central
┌──────────────────┐                  ┌──────────────┐
│ Kaze Stack       │                  │              │
│ Monitoring Stack │                  │              │
│  - Prometheus    │                  │              │
│  - Grafana       │  health beacon   │ PagerDuty /  │
│  - Loki          │─────────────────▶│ Slack / Ops  │
│  - Alertmanager  │  (minimal, no    │              │
│                  │   PII)           │              │
│ WireGuard VPN    │                  │              │
│  endpoint        │◀─ ── ── ── ── ──│ Ops team     │
└──────────────────┘  VPN for deep    │ VPN access   │
                      investigation   └──────────────┘

Health beacon (outbound, minimal): Alertmanager sends alert name + severity to Speedrun ops. No PII, no sensitive data. This enables proactive incident detection without requiring VPN access.

VPN (inbound, on-demand): Speedrun ops team VPNs into customer monitoring dashboards for investigation and deep dives. WireGuard-based, deployed as part of the stack, with SSO authentication and short-lived sessions.

Data classification:

DataStays in VPCCan flow out
Agent logs (may contain client data)YesNo
LLM request/response contentYesNo
Metrics (CPU, memory, latency, error rates)YesAggregated health score only
Token usage countsYesAggregated totals (for billing)
Alert triggersYesAlert name + severity only
Traces (OpenTelemetry)YesNo

This is a configurable policy per client. Some clients may allow anonymized metrics export; others want nothing out.

Upgrade path: GitOps (ArgoCD/Flux) pointing at Speedrun's release channel enables rolling out new versions across customer VPC deployments without logging into each one. Clients approve and apply updates through the GitOps workflow.

8. Security Controls

Threat model and full attack surface analysis documented in research/threat-model.md.

Tenant Isolation Enforcement

  • Database layer: Every query includes a tenant_id filter enforced by a query wrapper at the data access layer — not just application logic. No query can execute without tenant scoping.
  • Shared runtime: LLM Gateway and Agent Runtime flush all state between requests from different tenants. No context bleed.
  • Network: K8s network policies per namespace verified and tested. Tenant A's pods cannot reach Tenant B's resources.

Egress Filtering

  • Agents can only reach whitelisted external endpoints, configured per tenant and per vertical.
  • K8s network policies enforce egress restrictions — agents cannot open arbitrary outbound connections.
  • Tool Framework validates target URLs against the whitelist before executing external API calls.

Credential Lifecycle

  • Rotation: Automated Vault key rotation on schedule. Immediate rotation on suspected compromise.
  • Short-lived tokens preferred over long-lived API keys where providers support it (OAuth2 token refresh).
  • Anomaly detection: Usage spike on a key (e.g., 10x normal) triggers alert + auto-freeze pending review.
  • Blast radius: One compromised client key affects only that client's agents. Speedrun keys are separate.

Operator Access Controls

  • VPN sessions: Time-limited (4hr max), require ticket justification, logged with operator identity.
  • Vault audit: Every secret read logged — who, when, which secret, from where.
  • Database access: Via bastion host with session recording. No direct DB access from developer machines.
  • Separation of duties: No single person can both deploy code and access production secrets.