Infrastructure & Deployment

Part of Project Kaze Architecture

1. Deployment Modes

Kaze supports two deployment modes from a single codebase:

	Agency Model (SaaS)	Customer VPC
Who hosts	Speedrun Ventures	Client's cloud account
Who operates	Speedrun	Speedrun (managed) or Client
Data residency	Speedrun's infrastructure	Client's infrastructure
Network boundary	Shared (multi-tenant)	Isolated (single-tenant)
Trust level	Client trusts Speedrun with data	Client keeps all data in-house
Tenant isolation	Namespace or vCluster	Full cluster

Critical constraint: The exact same container images and IaC definitions deploy in both modes. No special SaaS version vs. on-prem version. One build, different configurations.

2. Cell-Based Deployment

The fundamental deployment unit is a cell — a self-contained, isolated deployment of the entire Kaze stack.

Each tenant (in agency mode) or each customer VPC is a cell
Cells are identical in structure, different in configuration
A cell can operate independently if disconnected from the mesh
Blast radius is contained — a failure in Cell 1 cannot impact Cell 2
Performance scaling analysis and bottleneck triggers documented in research/scalability-model.md

Cell density tiers (agency model):

Tier	Isolation	Use case
Dedicated cell	Full cluster per tenant	Large or security-sensitive clients
Shared cell, namespace isolation	Shared cluster, K8s namespace per tenant	Small clients, cost-optimized
Customer VPC cell	Full stack in client's cloud	Clients requiring data sovereignty

Hybrid approach for cost optimization (agency model):

┌─────────────────────────────────────────┐
│ Shared plane (stateless)                │
│ ┌───────────┐ ┌──────────────────────┐  │
│ │ LLM       │ │ Agent Runtime Pool   │  │
│ │ Gateway   │ │ (tenant-aware)       │  │
│ └───────────┘ └──────────────────────┘  │
├─────────────────────────────────────────┤
│ Isolated plane (stateful, per-tenant)   │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Tenant A  │ │Tenant B  │ │Tenant C  │ │
│ │ DB       │ │ DB       │ │ DB       │ │
│ │ Knowledge│ │ Knowledge│ │ Knowledge│ │
│ │ Secrets  │ │ Secrets  │ │ Secrets  │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────┘

Stateless components (LLM Gateway, Agent Runtime) can be shared with tenant context passed per-request. Stateful components (databases, knowledge graphs, secrets) are always isolated per-tenant.

3. Mesh Network Evolution

The deployment model evolves over time:

Phase 1: Single Cell — One deployment (Speedrun's infrastructure) running the full stack. Fast path to first agents operational.

Phase 2: Multi-Cell — Multiple cells for different clients and deployment modes. Hub-and-spoke topology with Speedrun's cell as the hub.

Phase 3: Federated Mesh — Cells can discover and communicate with each other. Knowledge syncs across cells (with policy controls). Evolving toward federated coordination.

Phase 3 topology:

┌──────┐    ┌──────┐    ┌──────┐
│Cell 1│◀──▶│Cell 2│◀──▶│Cell 3│
│Agency│    │VPC-A │    │VPC-B │
└──────┘    └──────┘    └──────┘

Cross-cell capabilities:
- Vertical knowledge sync (opt-in, anonymized)
- Agent discovery across cells
- Federated monitoring
- Coordinated upgrades

True peer-to-peer mesh is explicitly deferred — the coordination complexity is not justified at early scale. Hub-and-spoke evolving toward federation is the pragmatic path.

4. Cloud-Agnostic Strategy

No hard dependencies on managed cloud services. Every infrastructure dependency uses either a portable open-source equivalent or a provider abstraction.

Portable component choices:

Concern	Portable Choice	Rationale
Compute	Kubernetes	Universal across all clouds and on-prem
Messaging	NATS	Lightweight, built for distributed systems, zero cloud deps
Database	PostgreSQL (CloudNativePG)	K8s-native operator, runs anywhere
Object Storage	S3-compatible API (MinIO)	MinIO implements S3 API, GCS has interop
Secrets	HashiCorp Vault	Cloud-agnostic, runs as container
Observability	OpenTelemetry + Prometheus + Grafana + Loki	Full open-source stack
Service Mesh	Linkerd or Cilium	mTLS, traffic management, zero cloud lock-in
GitOps	ArgoCD or Flux	Continuous delivery for K8s
Ingress	Nginx Ingress or Envoy Gateway	Portable load balancing

IaC structure:

infra/
├── terraform/
│   ├── modules/
│   │   ├── kubernetes-cluster/   # Abstract "give me a cluster"
│   │   ├── networking/           # Abstract "give me a VPC + subnets"
│   │   └── storage/              # Abstract "give me a bucket"
│   └── providers/
│       ├── aws/                  # AWS-specific implementations
│       ├── gcp/                  # GCP-specific implementations
│       ├── azure/                # Azure-specific implementations
│       └── bare-metal/           # For on-prem
├── kubernetes/
│   ├── base/                     # The platform (cloud-agnostic)
│   └── overlays/
│       ├── agency-aws/
│       ├── agency-gcp/
│       ├── customer-vpc-aws/
│       └── customer-vpc-azure/

Strategy: Terraform/OpenTofu handles cloud-specific provisioning (one-time per environment). Kubernetes handles the application layer (universal). Once a cluster exists, everything above it is identical regardless of cloud.

Pragmatic approach: Build and validate on one cloud first (likely AWS). Ensure the architecture allows portability (containers, IaC, no proprietary services) but defer proving portability until a client requires it.

5. LLM Provider & Key Management

Dual-key model:

Speedrun keys — Speedrun's own API keys across multiple LLM providers, centrally managed and monitored.
Client keys — Clients bring their own keys (e.g., Azure OpenAI credits, Anthropic volume discount, Google Cloud credits).

Key routing logic:

Agent X for Client A → use Client A's Anthropic key
Agent Y for Client A → use Speedrun's OpenAI key (client has no OpenAI credits)
Agent Z for Client B → use Client B's Azure OpenAI endpoint
Fallback: if Client A's key hits rate limit → fall back to Speedrun's key (if policy allows)

Routing is configured per tenant + agent + provider, not hardcoded.

Key storage & security:

Vault paths:
  speedrun/
    ├── anthropic-key-1
    ├── openai-key-1
    └── google-key-1
  clients/
    ├── client-a/
    │   ├── anthropic-key
    │   └── azure-openai
    └── client-b/
        └── anthropic-key

Security rules:

Client keys are encrypted at rest and access-scoped — only agents running for that client can access their keys
In customer VPC mode, client keys never leave their VPC
In agency mode, client keys are stored in Speedrun's Vault with strict tenant-scoped access policies
Every key usage is logged with full attribution — clients can see exactly which agent used their key, when, and token count

6. Security Architecture

Network Security

Zero-trust networking between all components. mTLS everywhere, even inside the cluster.
In agency mode: strict Kubernetes network policies — Tenant A's agents can never reach Tenant B's resources.
In customer VPC mode: clearly defined ingress/egress rules.

Secrets Management

No reliance on a single secrets provider. Vault is primary, with the ability to integrate with cloud-native secret managers where needed.
Agent credentials (API keys to client systems) never leave the deployment boundary.
In customer VPC mode, Speedrun operators have no access to client secrets.

Audit & Compliance

Every agent action is logged with full attribution.
Immutable audit logs that the client can export and own.
Required for SMEs in regulated industries (finance, healthcare).

Supply Chain Security

Signed container images.
SBOM (Software Bill of Materials) for customer VPC deployments.
Reproducible builds so customers can verify what's running in their VPC.

Identity & Trust

Agent-to-agent authentication via capability-based tokens.
Cross-cell communication secured via mTLS with signed agent manifests.
Compromised node containment — a single cell breach cannot propagate to others.

7. Observability in Customer VPC

The full monitoring stack deploys inside every customer VPC as part of the Kaze stack:

Customer VPC                          Speedrun Central
┌──────────────────┐                  ┌──────────────┐
│ Kaze Stack       │                  │              │
│ Monitoring Stack │                  │              │
│  - Prometheus    │                  │              │
│  - Grafana       │  health beacon   │ PagerDuty /  │
│  - Loki          │─────────────────▶│ Slack / Ops  │
│  - Alertmanager  │  (minimal, no    │              │
│                  │   PII)           │              │
│ WireGuard VPN    │                  │              │
│  endpoint        │◀─ ── ── ── ── ──│ Ops team     │
└──────────────────┘  VPN for deep    │ VPN access   │
                      investigation   └──────────────┘

Health beacon (outbound, minimal): Alertmanager sends alert name + severity to Speedrun ops. No PII, no sensitive data. This enables proactive incident detection without requiring VPN access.

VPN (inbound, on-demand): Speedrun ops team VPNs into customer monitoring dashboards for investigation and deep dives. WireGuard-based, deployed as part of the stack, with SSO authentication and short-lived sessions.

Data classification:

Data	Stays in VPC	Can flow out
Agent logs (may contain client data)	Yes	No
LLM request/response content	Yes	No
Metrics (CPU, memory, latency, error rates)	Yes	Aggregated health score only
Token usage counts	Yes	Aggregated totals (for billing)
Alert triggers	Yes	Alert name + severity only
Traces (OpenTelemetry)	Yes	No

This is a configurable policy per client. Some clients may allow anonymized metrics export; others want nothing out.

Upgrade path: GitOps (ArgoCD/Flux) pointing at Speedrun's release channel enables rolling out new versions across customer VPC deployments without logging into each one. Clients approve and apply updates through the GitOps workflow.

8. Security Controls

Threat model and full attack surface analysis documented in research/threat-model.md.

Tenant Isolation Enforcement

Database layer: Every query includes a tenant_id filter enforced by a query wrapper at the data access layer — not just application logic. No query can execute without tenant scoping.
Shared runtime: LLM Gateway and Agent Runtime flush all state between requests from different tenants. No context bleed.
Network: K8s network policies per namespace verified and tested. Tenant A's pods cannot reach Tenant B's resources.

Egress Filtering

Agents can only reach whitelisted external endpoints, configured per tenant and per vertical.
K8s network policies enforce egress restrictions — agents cannot open arbitrary outbound connections.
Tool Framework validates target URLs against the whitelist before executing external API calls.

Credential Lifecycle

Rotation: Automated Vault key rotation on schedule. Immediate rotation on suspected compromise.
Short-lived tokens preferred over long-lived API keys where providers support it (OAuth2 token refresh).
Anomaly detection: Usage spike on a key (e.g., 10x normal) triggers alert + auto-freeze pending review.
Blast radius: One compromised client key affects only that client's agents. Speedrun keys are separate.

Operator Access Controls

VPN sessions: Time-limited (4hr max), require ticket justification, logged with operator identity.
Vault audit: Every secret read logged — who, when, which secret, from where.
Database access: Via bastion host with session recording. No direct DB access from developer machines.
Separation of duties: No single person can both deploy code and access production secrets.

Infrastructure & Deployment ​

1. Deployment Modes ​

2. Cell-Based Deployment ​

3. Mesh Network Evolution ​

4. Cloud-Agnostic Strategy ​

5. LLM Provider & Key Management ​

6. Security Architecture ​

Network Security ​

Secrets Management ​

Audit & Compliance ​

Supply Chain Security ​

Identity & Trust ​

7. Observability in Customer VPC ​

8. Security Controls ​

Tenant Isolation Enforcement ​

Egress Filtering ​

Credential Lifecycle ​

Operator Access Controls ​