Infrastructure & Deployment
Part of Project Kaze Architecture
1. Current Deployment Topology
All Kaze services run on a single Kubernetes cluster with a GitOps deployment model:
┌─ Kubernetes Cluster ──────────────────────────────────────────────┐
│ │
│ ┌─ kaze-runtime pod ─────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌───────────────┐ ┌──────────────────┐ ┌─────────────┐ │ │
│ │ │ kaze-runtime │ │ git-sync │ │ langfuse-mcp│ │ │
│ │ │ (port 4100) │ │ (init+sidecar) │ │ (port 4101) │ │ │
│ │ │ │ │ clones agent repo│ │ SSE proxy │ │ │
│ │ └───────────────┘ └──────────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ Shared volume: /agents/ (agent YAML defs)│ │
│ │ │ tailscale │ Shared volume: /workspace/ (5Gi PVC) │ │
│ │ │ (sidecar) │ │ │
│ │ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ kaze-gateway pod ─────────────────────────────────────────┐ │
│ │ ┌───────────────┐ ┌─────────────┐ │ │
│ │ │ kaze-gateway │ │ tailscale │ │ │
│ │ │ (port 4200) │ │ (sidecar) │ │ │
│ │ └───────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ kaze-knowledge pod ───────────────────────────────────────┐ │
│ │ ┌───────────────┐ ┌─────────────┐ │ │
│ │ │ kaze-knowledge│ │ tailscale │ │ │
│ │ │ (port 4300) │ │ (sidecar) │ │ │
│ │ └───────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ PostgreSQL + pgvector ────────────────────────────────────┐ │
│ │ Knowledge vectors · Mem0 metadata │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ Langfuse ────────────────────────────────────────────────┐ │
│ │ Observability · Tracing · LLM call analytics │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ Vault (ExternalSecrets) ─────────────────────────────────┐ │
│ │ LLM API keys · GitHub token · Service credentials │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ OpenClaw node (external, Tailscale-connected) ───────────┐ │
│ │ OpenClaw gateway · kaze-runtime plugin │ │
│ │ Channels: Slack, WhatsApp, Telegram │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘2. Sidecars & Init Containers
git-sync (agent definitions)
Clones the kaze-agent-ops repo into a shared volume so the runtime can load vertical and skill YAML definitions without baking them into the container image.
- Init container: Clones the repo before the runtime starts
- Sidecar: Pulls every 60 seconds for live updates
- Mount: Read-only at
/agents/kaze-agent-ops/ - Benefit: Skill changes deploy via git push + pod restart, not image rebuild
langfuse-mcp (MCP proxy)
Runs the Langfuse MCP server as a sidecar, exposing Langfuse data over the Model Context Protocol via SSE.
- Port: 4101 (internal)
- Proxied by runtime:
GET /mcp/*→http://localhost:4101/mcp/* - Use case: Enables LLM agents to query Langfuse observability data via MCP tools
tailscale (VPN mesh)
Provides secure networking between the K8s cluster and external nodes (OpenClaw).
- Runs as sidecar on each pod that needs VPN access
- Enables: Runtime ↔ OpenClaw communication, dashboard access, inter-cluster connectivity
- Auth: Tailscale auth keys stored in Vault, injected via ExternalSecrets
3. CI/CD Pipeline
All repos follow the same pipeline: GitHub Actions → Container Image → GitOps → Kubernetes.
Developer pushes to main
│
▼
┌─ GitHub Actions ────────────────────────────────┐
│ 1. Checkout code │
│ 2. Docker buildx (multi-arch: amd64 + arm64) │
│ 3. Push image to GHCR (ghcr.io/speedrun-...) │
│ 4. Update GitOps repo: │
│ cd gitops/cluster-dev/<service>/ │
│ kustomize edit set image <new-tag> │
│ git commit + push │
└──────────────────────────────────────────────────┘
│
▼
┌─ GitOps Repo (cluster-dev) ─────────────────────┐
│ ArgoCD detects commit │
│ Syncs Kustomize overlays to cluster │
│ Rolling update of affected deployments │
└──────────────────────────────────────────────────┘Key properties:
- No manual kubectl. All cluster state is defined in git. ArgoCD is the only entity that applies manifests.
- Multi-arch builds. Images target both amd64 and arm64 (dev machines use Apple Silicon).
- Image tags are git SHA or semver. Kustomize overlays reference specific tags — no
latest. - Agent definitions (kaze-agent-ops) don't need image builds. The git-sync sidecar pulls changes. Pod restart picks up new skills.
4. GitOps Repository Structure
gitops/cluster-dev/
├── base/
│ └── vault-secret-store.yaml # ClusterSecretStore for all services
├── kaze-runtime/
│ ├── kustomization.yaml
│ ├── deployment.yaml # Runtime + git-sync + langfuse-mcp + tailscale
│ ├── service.yaml
│ ├── external-secret.yaml # Pulls from Vault
│ └── workspace-pvc.yaml # 5Gi shared workspace
├── kaze-gateway/
│ ├── kustomization.yaml
│ ├── deployment.yaml # Gateway + tailscale
│ ├── service.yaml
│ └── external-secret.yaml
├── kaze-knowledge/
│ ├── kustomization.yaml
│ ├── deployment.yaml # Knowledge + tailscale
│ ├── service.yaml
│ └── external-secret.yaml
└── ...Each service subdirectory is self-contained with its own Kustomization. Overlays can be added per environment (staging, production, customer VPC).
5. Secrets Management
Architecture
┌─ HashiCorp Vault ──────────────────────────────────────┐
│ │
│ kv/kaze-runtime/ │
│ ├── KAZE_GATEWAY_URL │
│ ├── KAZE_KNOWLEDGE_URL │
│ ├── LANGFUSE_SECRET_KEY │
│ ├── LANGFUSE_PUBLIC_KEY │
│ └── GIT_SYNC_TOKEN (for agent repo access) │
│ │
│ kv/kaze-gateway/ │
│ ├── ANTHROPIC_API_KEY │
│ ├── GOOGLE_GENERATIVE_AI_API_KEY │
│ ├── GITHUB_TOKEN │
│ ├── LANGFUSE_SECRET_KEY │
│ └── LANGFUSE_PUBLIC_KEY │
│ │
│ kv/kaze-knowledge/ │
│ ├── GOOGLE_API_KEY (embeddings) │
│ ├── DATABASE_URL │
│ ├── LANGFUSE_SECRET_KEY │
│ └── LANGFUSE_PUBLIC_KEY │
│ │
└─────────────────────────────────────────────────────────┘
│
│ Kubernetes auth method
▼
┌─ ExternalSecrets Operator ─────────────────────────────┐
│ ClusterSecretStore → Vault backend │
│ ExternalSecret per service → K8s Secret │
│ Refresh interval: 1 minute │
│ Secret rotation: automatic on Vault update │
└─────────────────────────────────────────────────────────┘
│
▼
┌─ K8s Secrets ──────────────────────────────────────────┐
│ Mounted as env vars in pod containers │
│ Never checked into git │
│ Never visible in container images │
└─────────────────────────────────────────────────────────┘Key design:
- Vault is the source of truth for all secrets. K8s Secrets are derived copies.
- Kubernetes auth method — pods authenticate to Vault using their service account. No static Vault tokens.
- 1-minute refresh — ExternalSecrets polls Vault every 60s. Secret rotation propagates within a minute.
- Zero secrets in the runtime — the runtime has no LLM keys or GitHub tokens. It only knows gateway/knowledge URLs. The gateway holds all provider credentials.
6. OpenClaw Integration
OpenClaw runs on a separate node (not in K8s), connected to the cluster via Tailscale VPN.
Plugin Architecture
The Kaze plugin for OpenClaw is a thin TypeScript client that exposes three tools:
| Tool | Description |
|---|---|
kaze_dispatch_task | Dispatch a task to a vertical/skill on the runtime |
kaze_list_verticals | List available verticals and their skills |
kaze_agent_status | Check agent health and supervision status |
Hooks:
- Pre-tool hook: Before any tool call, searches kaze-knowledge for relevant context and injects it into the conversation
- Post-message hook: After assistant messages, stores conversation summaries to kaze-knowledge
- Langfuse tracing: OpenClaw sessions are traced in Langfuse for end-to-end observability
Deployment
The plugin is deployed via Ansible playbook:
- Copies plugin files to the OpenClaw node
- Configures OpenClaw settings to register the plugin
- Sets environment variables for runtime URL and knowledge URL
7. Workspace Management
A shared PersistentVolumeClaim (workspace-pvc, 5Gi) provides persistent storage for git repositories that agents work with.
- Gateway's
workspace_listtool clones repos on demand (git clone) or updates them (git pull) - Gateway's
workspace_readtool reads files from cloned repos - Gateway's
file_glob/file_readtools operate on the workspace - Security:
WORKSPACE_DENY_REPOSenv var blocks access to sensitive repos (e.g., the infra repo itself)
8. Observability Stack
Current: Langfuse
Langfuse provides LLM-specific observability:
- Tracing: Every LLM call traced with input/output, model, tokens, latency, cost
- Gateway integration: OpenTelemetry span processor auto-instruments all
generateTextcalls - Runtime integration: Langfuse API proxied at
GET /langfuse/*for dashboard access - MCP integration: Langfuse MCP sidecar enables agents to query their own traces
Future: Full Stack
The design calls for a comprehensive observability stack:
| Component | Role | Status |
|---|---|---|
| Langfuse | LLM-specific tracing and analytics | Deployed |
| Prometheus | Infrastructure metrics (CPU, memory, network) | Not yet deployed |
| Grafana | Dashboards and visualization | Not yet deployed |
| Loki | Log aggregation | Not yet deployed |
| OpenTelemetry | Distributed tracing | Partially (Langfuse span processor only) |
9. Networking
Internal (Cluster)
- Services communicate via K8s ClusterIP services on fixed ports (4100, 4200, 4300)
- No ingress controller currently — services are accessed via Tailscale or port-forward
- No network policies currently — all pods can reach all other pods within the namespace
External (Tailscale)
- Tailscale sidecar on each pod provides VPN connectivity
- OpenClaw node connects to the cluster via Tailscale
- Dashboard and development access via Tailscale DNS names
- No public internet exposure
10. Deployment Modes (Design)
The architecture supports two deployment modes from a single codebase:
| Agency Model (SaaS) | Customer VPC | |
|---|---|---|
| Who hosts | Speedrun Ventures | Client's cloud account |
| Data residency | Speedrun's infrastructure | Client's infrastructure |
| Tenant isolation | Namespace (shared cell) or dedicated cluster | Full cluster |
| Current status | Active (single cell) | Not yet deployed |
Cell architecture: Each deployment is a self-contained cell. Cells are identical in structure, different in configuration. A cell can operate independently.
Cloud-agnostic strategy: All infrastructure uses portable open-source components. No managed cloud services (no RDS, no Cloud SQL, no SQS). Once a K8s cluster exists, everything above it is identical regardless of cloud provider.
| Concern | Choice | Rationale |
|---|---|---|
| Compute | Kubernetes | Universal |
| Database | PostgreSQL (CloudNativePG) | K8s-native operator |
| Secrets | HashiCorp Vault | Cloud-agnostic |
| Messaging | NATS (future) | Lightweight, portable |
| Observability | Langfuse + OTel + Prometheus | Open-source |
| GitOps | ArgoCD | K8s-native CD |
11. Scaling Considerations
Current state (Stage 0): Single cell, 1 vertical, ~6 skills, 1 Postgres instance. Well within single-node capacity.
Next bottleneck (Stage 1, 5-10 clients):
- Postgres connection pooling (PgBouncer) needed when agent count grows
- LLM provider rate limits become relevant — key pooling across multiple keys
- Observation storage grows — table partitioning needed
Architecture allows horizontal scaling of:
- Runtime pods (stateless — scale with HPA)
- Gateway pods (stateless — scale with HPA)
- Knowledge pods (stateless — DB handles persistence)
What requires vertical scaling or sharding:
- PostgreSQL (read replicas at Stage 2, per-component split at Stage 3)
- pgvector indexes (evaluate Qdrant for hot-path at Stage 2)
See Non-Functional Assessment for detailed scalability model.