Infrastructure & Deployment
Part of Project Kaze Architecture
1. Deployment Modes
Kaze supports two deployment modes from a single codebase:
| Agency Model (SaaS) | Customer VPC | |
|---|---|---|
| Who hosts | Speedrun Ventures | Client's cloud account |
| Who operates | Speedrun | Speedrun (managed) or Client |
| Data residency | Speedrun's infrastructure | Client's infrastructure |
| Network boundary | Shared (multi-tenant) | Isolated (single-tenant) |
| Trust level | Client trusts Speedrun with data | Client keeps all data in-house |
| Tenant isolation | Namespace or vCluster | Full cluster |
Critical constraint: The exact same container images and IaC definitions deploy in both modes. No special SaaS version vs. on-prem version. One build, different configurations.
2. Cell-Based Deployment
The fundamental deployment unit is a cell — a self-contained, isolated deployment of the entire Kaze stack.
- Each tenant (in agency mode) or each customer VPC is a cell
- Cells are identical in structure, different in configuration
- A cell can operate independently if disconnected from the mesh
- Blast radius is contained — a failure in Cell 1 cannot impact Cell 2
- Performance scaling analysis and bottleneck triggers documented in research/scalability-model.md
Cell density tiers (agency model):
| Tier | Isolation | Use case |
|---|---|---|
| Dedicated cell | Full cluster per tenant | Large or security-sensitive clients |
| Shared cell, namespace isolation | Shared cluster, K8s namespace per tenant | Small clients, cost-optimized |
| Customer VPC cell | Full stack in client's cloud | Clients requiring data sovereignty |
Hybrid approach for cost optimization (agency model):
┌─────────────────────────────────────────┐
│ Shared plane (stateless) │
│ ┌───────────┐ ┌──────────────────────┐ │
│ │ LLM │ │ Agent Runtime Pool │ │
│ │ Gateway │ │ (tenant-aware) │ │
│ └───────────┘ └──────────────────────┘ │
├─────────────────────────────────────────┤
│ Isolated plane (stateful, per-tenant) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Tenant A │ │Tenant B │ │Tenant C │ │
│ │ DB │ │ DB │ │ DB │ │
│ │ Knowledge│ │ Knowledge│ │ Knowledge│ │
│ │ Secrets │ │ Secrets │ │ Secrets │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────┘Stateless components (LLM Gateway, Agent Runtime) can be shared with tenant context passed per-request. Stateful components (databases, knowledge graphs, secrets) are always isolated per-tenant.
3. Mesh Network Evolution
The deployment model evolves over time:
Phase 1: Single Cell — One deployment (Speedrun's infrastructure) running the full stack. Fast path to first agents operational.
Phase 2: Multi-Cell — Multiple cells for different clients and deployment modes. Hub-and-spoke topology with Speedrun's cell as the hub.
Phase 3: Federated Mesh — Cells can discover and communicate with each other. Knowledge syncs across cells (with policy controls). Evolving toward federated coordination.
Phase 3 topology:
┌──────┐ ┌──────┐ ┌──────┐
│Cell 1│◀──▶│Cell 2│◀──▶│Cell 3│
│Agency│ │VPC-A │ │VPC-B │
└──────┘ └──────┘ └──────┘
Cross-cell capabilities:
- Vertical knowledge sync (opt-in, anonymized)
- Agent discovery across cells
- Federated monitoring
- Coordinated upgradesTrue peer-to-peer mesh is explicitly deferred — the coordination complexity is not justified at early scale. Hub-and-spoke evolving toward federation is the pragmatic path.
4. Cloud-Agnostic Strategy
No hard dependencies on managed cloud services. Every infrastructure dependency uses either a portable open-source equivalent or a provider abstraction.
Portable component choices:
| Concern | Portable Choice | Rationale |
|---|---|---|
| Compute | Kubernetes | Universal across all clouds and on-prem |
| Messaging | NATS | Lightweight, built for distributed systems, zero cloud deps |
| Database | PostgreSQL (CloudNativePG) | K8s-native operator, runs anywhere |
| Object Storage | S3-compatible API (MinIO) | MinIO implements S3 API, GCS has interop |
| Secrets | HashiCorp Vault | Cloud-agnostic, runs as container |
| Observability | OpenTelemetry + Prometheus + Grafana + Loki | Full open-source stack |
| Service Mesh | Linkerd or Cilium | mTLS, traffic management, zero cloud lock-in |
| GitOps | ArgoCD or Flux | Continuous delivery for K8s |
| Ingress | Nginx Ingress or Envoy Gateway | Portable load balancing |
IaC structure:
infra/
├── terraform/
│ ├── modules/
│ │ ├── kubernetes-cluster/ # Abstract "give me a cluster"
│ │ ├── networking/ # Abstract "give me a VPC + subnets"
│ │ └── storage/ # Abstract "give me a bucket"
│ └── providers/
│ ├── aws/ # AWS-specific implementations
│ ├── gcp/ # GCP-specific implementations
│ ├── azure/ # Azure-specific implementations
│ └── bare-metal/ # For on-prem
├── kubernetes/
│ ├── base/ # The platform (cloud-agnostic)
│ └── overlays/
│ ├── agency-aws/
│ ├── agency-gcp/
│ ├── customer-vpc-aws/
│ └── customer-vpc-azure/Strategy: Terraform/OpenTofu handles cloud-specific provisioning (one-time per environment). Kubernetes handles the application layer (universal). Once a cluster exists, everything above it is identical regardless of cloud.
Pragmatic approach: Build and validate on one cloud first (likely AWS). Ensure the architecture allows portability (containers, IaC, no proprietary services) but defer proving portability until a client requires it.
5. LLM Provider & Key Management
Dual-key model:
- Speedrun keys — Speedrun's own API keys across multiple LLM providers, centrally managed and monitored.
- Client keys — Clients bring their own keys (e.g., Azure OpenAI credits, Anthropic volume discount, Google Cloud credits).
Key routing logic:
- Agent X for Client A → use Client A's Anthropic key
- Agent Y for Client A → use Speedrun's OpenAI key (client has no OpenAI credits)
- Agent Z for Client B → use Client B's Azure OpenAI endpoint
- Fallback: if Client A's key hits rate limit → fall back to Speedrun's key (if policy allows)
Routing is configured per tenant + agent + provider, not hardcoded.
Key storage & security:
Vault paths:
speedrun/
├── anthropic-key-1
├── openai-key-1
└── google-key-1
clients/
├── client-a/
│ ├── anthropic-key
│ └── azure-openai
└── client-b/
└── anthropic-keySecurity rules:
- Client keys are encrypted at rest and access-scoped — only agents running for that client can access their keys
- In customer VPC mode, client keys never leave their VPC
- In agency mode, client keys are stored in Speedrun's Vault with strict tenant-scoped access policies
- Every key usage is logged with full attribution — clients can see exactly which agent used their key, when, and token count
6. Security Architecture
Network Security
- Zero-trust networking between all components. mTLS everywhere, even inside the cluster.
- In agency mode: strict Kubernetes network policies — Tenant A's agents can never reach Tenant B's resources.
- In customer VPC mode: clearly defined ingress/egress rules.
Secrets Management
- No reliance on a single secrets provider. Vault is primary, with the ability to integrate with cloud-native secret managers where needed.
- Agent credentials (API keys to client systems) never leave the deployment boundary.
- In customer VPC mode, Speedrun operators have no access to client secrets.
Audit & Compliance
- Every agent action is logged with full attribution.
- Immutable audit logs that the client can export and own.
- Required for SMEs in regulated industries (finance, healthcare).
Supply Chain Security
- Signed container images.
- SBOM (Software Bill of Materials) for customer VPC deployments.
- Reproducible builds so customers can verify what's running in their VPC.
Identity & Trust
- Agent-to-agent authentication via capability-based tokens.
- Cross-cell communication secured via mTLS with signed agent manifests.
- Compromised node containment — a single cell breach cannot propagate to others.
7. Observability in Customer VPC
The full monitoring stack deploys inside every customer VPC as part of the Kaze stack:
Customer VPC Speedrun Central
┌──────────────────┐ ┌──────────────┐
│ Kaze Stack │ │ │
│ Monitoring Stack │ │ │
│ - Prometheus │ │ │
│ - Grafana │ health beacon │ PagerDuty / │
│ - Loki │─────────────────▶│ Slack / Ops │
│ - Alertmanager │ (minimal, no │ │
│ │ PII) │ │
│ WireGuard VPN │ │ │
│ endpoint │◀─ ── ── ── ── ──│ Ops team │
└──────────────────┘ VPN for deep │ VPN access │
investigation └──────────────┘Health beacon (outbound, minimal): Alertmanager sends alert name + severity to Speedrun ops. No PII, no sensitive data. This enables proactive incident detection without requiring VPN access.
VPN (inbound, on-demand): Speedrun ops team VPNs into customer monitoring dashboards for investigation and deep dives. WireGuard-based, deployed as part of the stack, with SSO authentication and short-lived sessions.
Data classification:
| Data | Stays in VPC | Can flow out |
|---|---|---|
| Agent logs (may contain client data) | Yes | No |
| LLM request/response content | Yes | No |
| Metrics (CPU, memory, latency, error rates) | Yes | Aggregated health score only |
| Token usage counts | Yes | Aggregated totals (for billing) |
| Alert triggers | Yes | Alert name + severity only |
| Traces (OpenTelemetry) | Yes | No |
This is a configurable policy per client. Some clients may allow anonymized metrics export; others want nothing out.
Upgrade path: GitOps (ArgoCD/Flux) pointing at Speedrun's release channel enables rolling out new versions across customer VPC deployments without logging into each one. Clients approve and apply updates through the GitOps workflow.
8. Security Controls
Threat model and full attack surface analysis documented in research/threat-model.md.
Tenant Isolation Enforcement
- Database layer: Every query includes a tenant_id filter enforced by a query wrapper at the data access layer — not just application logic. No query can execute without tenant scoping.
- Shared runtime: LLM Gateway and Agent Runtime flush all state between requests from different tenants. No context bleed.
- Network: K8s network policies per namespace verified and tested. Tenant A's pods cannot reach Tenant B's resources.
Egress Filtering
- Agents can only reach whitelisted external endpoints, configured per tenant and per vertical.
- K8s network policies enforce egress restrictions — agents cannot open arbitrary outbound connections.
- Tool Framework validates target URLs against the whitelist before executing external API calls.
Credential Lifecycle
- Rotation: Automated Vault key rotation on schedule. Immediate rotation on suspected compromise.
- Short-lived tokens preferred over long-lived API keys where providers support it (OAuth2 token refresh).
- Anomaly detection: Usage spike on a key (e.g., 10x normal) triggers alert + auto-freeze pending review.
- Blast radius: One compromised client key affects only that client's agents. Speedrun keys are separate.
Operator Access Controls
- VPN sessions: Time-limited (4hr max), require ticket justification, logged with operator identity.
- Vault audit: Every secret read logged — who, when, which secret, from where.
- Database access: Via bastion host with session recording. No direct DB access from developer machines.
- Separation of duties: No single person can both deploy code and access production secrets.