Infrastructure & Deployment

Part of Project Kaze Architecture

1. Current Deployment Topology

All Kaze services run on a single Kubernetes cluster with a GitOps deployment model:

┌─ Kubernetes Cluster ──────────────────────────────────────────────┐
│                                                                    │
│  ┌─ kaze-runtime pod ─────────────────────────────────────────┐   │
│  │                                                             │   │
│  │  ┌───────────────┐  ┌──────────────────┐  ┌─────────────┐  │   │
│  │  │ kaze-runtime  │  │ git-sync         │  │ langfuse-mcp│  │   │
│  │  │ (port 4100)   │  │ (init+sidecar)   │  │ (port 4101) │  │   │
│  │  │               │  │ clones agent repo│  │ SSE proxy   │  │   │
│  │  └───────────────┘  └──────────────────┘  └─────────────┘  │   │
│  │                                                             │   │
│  │  ┌─────────────┐  Shared volume: /agents/ (agent YAML defs)│   │
│  │  │ tailscale   │  Shared volume: /workspace/ (5Gi PVC)     │   │
│  │  │ (sidecar)   │                                           │   │
│  │  └─────────────┘                                           │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ kaze-gateway pod ─────────────────────────────────────────┐   │
│  │  ┌───────────────┐  ┌─────────────┐                        │   │
│  │  │ kaze-gateway  │  │ tailscale   │                        │   │
│  │  │ (port 4200)   │  │ (sidecar)   │                        │   │
│  │  └───────────────┘  └─────────────┘                        │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ kaze-knowledge pod ───────────────────────────────────────┐   │
│  │  ┌───────────────┐  ┌─────────────┐                        │   │
│  │  │ kaze-knowledge│  │ tailscale   │                        │   │
│  │  │ (port 4300)   │  │ (sidecar)   │                        │   │
│  │  └───────────────┘  └─────────────┘                        │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ PostgreSQL + pgvector ────────────────────────────────────┐   │
│  │  Knowledge vectors · Mem0 metadata                         │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ Langfuse ────────────────────────────────────────────────┐    │
│  │  Observability · Tracing · LLM call analytics              │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ Vault (ExternalSecrets) ─────────────────────────────────┐    │
│  │  LLM API keys · GitHub token · Service credentials         │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ OpenClaw node (external, Tailscale-connected) ───────────┐   │
│  │  OpenClaw gateway · kaze-runtime plugin                    │   │
│  │  Channels: Slack, WhatsApp, Telegram                       │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

2. Sidecars & Init Containers

git-sync (agent definitions)

Clones the kaze-agent-ops repo into a shared volume so the runtime can load vertical and skill YAML definitions without baking them into the container image.

Init container: Clones the repo before the runtime starts
Sidecar: Pulls every 60 seconds for live updates
Mount: Read-only at /agents/kaze-agent-ops/
Benefit: Skill changes deploy via git push + pod restart, not image rebuild

langfuse-mcp (MCP proxy)

Runs the Langfuse MCP server as a sidecar, exposing Langfuse data over the Model Context Protocol via SSE.

Port: 4101 (internal)
Proxied by runtime: GET /mcp/* → http://localhost:4101/mcp/*
Use case: Enables LLM agents to query Langfuse observability data via MCP tools

tailscale (VPN mesh)

Provides secure networking between the K8s cluster and external nodes (OpenClaw).

Runs as sidecar on each pod that needs VPN access
Enables: Runtime ↔ OpenClaw communication, dashboard access, inter-cluster connectivity
Auth: Tailscale auth keys stored in Vault, injected via ExternalSecrets

3. CI/CD Pipeline

All repos follow the same pipeline: GitHub Actions → Container Image → GitOps → Kubernetes.

Developer pushes to main
         │
         ▼
┌─ GitHub Actions ────────────────────────────────┐
│  1. Checkout code                                │
│  2. Docker buildx (multi-arch: amd64 + arm64)   │
│  3. Push image to GHCR (ghcr.io/speedrun-...)   │
│  4. Update GitOps repo:                          │
│     cd gitops/cluster-dev/<service>/             │
│     kustomize edit set image <new-tag>           │
│     git commit + push                            │
└──────────────────────────────────────────────────┘
         │
         ▼
┌─ GitOps Repo (cluster-dev) ─────────────────────┐
│  ArgoCD detects commit                           │
│  Syncs Kustomize overlays to cluster             │
│  Rolling update of affected deployments          │
└──────────────────────────────────────────────────┘

Key properties:

No manual kubectl. All cluster state is defined in git. ArgoCD is the only entity that applies manifests.
Multi-arch builds. Images target both amd64 and arm64 (dev machines use Apple Silicon).
Image tags are git SHA or semver. Kustomize overlays reference specific tags — no latest.
Agent definitions (kaze-agent-ops) don't need image builds. The git-sync sidecar pulls changes. Pod restart picks up new skills.

4. GitOps Repository Structure

gitops/cluster-dev/
├── base/
│   └── vault-secret-store.yaml       # ClusterSecretStore for all services
├── kaze-runtime/
│   ├── kustomization.yaml
│   ├── deployment.yaml               # Runtime + git-sync + langfuse-mcp + tailscale
│   ├── service.yaml
│   ├── external-secret.yaml          # Pulls from Vault
│   └── workspace-pvc.yaml            # 5Gi shared workspace
├── kaze-gateway/
│   ├── kustomization.yaml
│   ├── deployment.yaml               # Gateway + tailscale
│   ├── service.yaml
│   └── external-secret.yaml
├── kaze-knowledge/
│   ├── kustomization.yaml
│   ├── deployment.yaml               # Knowledge + tailscale
│   ├── service.yaml
│   └── external-secret.yaml
└── ...

Each service subdirectory is self-contained with its own Kustomization. Overlays can be added per environment (staging, production, customer VPC).

5. Secrets Management

Architecture

┌─ HashiCorp Vault ──────────────────────────────────────┐
│                                                         │
│  kv/kaze-runtime/                                       │
│    ├── KAZE_GATEWAY_URL                                 │
│    ├── KAZE_KNOWLEDGE_URL                               │
│    ├── LANGFUSE_SECRET_KEY                              │
│    ├── LANGFUSE_PUBLIC_KEY                              │
│    └── GIT_SYNC_TOKEN (for agent repo access)           │
│                                                         │
│  kv/kaze-gateway/                                       │
│    ├── ANTHROPIC_API_KEY                                │
│    ├── GOOGLE_GENERATIVE_AI_API_KEY                    │
│    ├── GITHUB_TOKEN                                     │
│    ├── LANGFUSE_SECRET_KEY                              │
│    └── LANGFUSE_PUBLIC_KEY                              │
│                                                         │
│  kv/kaze-knowledge/                                     │
│    ├── GOOGLE_API_KEY (embeddings)                      │
│    ├── DATABASE_URL                                     │
│    ├── LANGFUSE_SECRET_KEY                              │
│    └── LANGFUSE_PUBLIC_KEY                              │
│                                                         │
└─────────────────────────────────────────────────────────┘
         │
         │ Kubernetes auth method
         ▼
┌─ ExternalSecrets Operator ─────────────────────────────┐
│  ClusterSecretStore → Vault backend                     │
│  ExternalSecret per service → K8s Secret                │
│  Refresh interval: 1 minute                             │
│  Secret rotation: automatic on Vault update             │
└─────────────────────────────────────────────────────────┘
         │
         ▼
┌─ K8s Secrets ──────────────────────────────────────────┐
│  Mounted as env vars in pod containers                  │
│  Never checked into git                                 │
│  Never visible in container images                      │
└─────────────────────────────────────────────────────────┘

Key design:

Vault is the source of truth for all secrets. K8s Secrets are derived copies.
Kubernetes auth method — pods authenticate to Vault using their service account. No static Vault tokens.
1-minute refresh — ExternalSecrets polls Vault every 60s. Secret rotation propagates within a minute.
Zero secrets in the runtime — the runtime has no LLM keys or GitHub tokens. It only knows gateway/knowledge URLs. The gateway holds all provider credentials.

6. OpenClaw Integration

OpenClaw runs on a separate node (not in K8s), connected to the cluster via Tailscale VPN.

Plugin Architecture

The Kaze plugin for OpenClaw is a thin TypeScript client that exposes three tools:

Tool	Description
`kaze_dispatch_task`	Dispatch a task to a vertical/skill on the runtime
`kaze_list_verticals`	List available verticals and their skills
`kaze_agent_status`	Check agent health and supervision status

Hooks:

Pre-tool hook: Before any tool call, searches kaze-knowledge for relevant context and injects it into the conversation
Post-message hook: After assistant messages, stores conversation summaries to kaze-knowledge
Langfuse tracing: OpenClaw sessions are traced in Langfuse for end-to-end observability

Deployment

The plugin is deployed via Ansible playbook:

Copies plugin files to the OpenClaw node
Configures OpenClaw settings to register the plugin
Sets environment variables for runtime URL and knowledge URL

7. Workspace Management

A shared PersistentVolumeClaim (workspace-pvc, 5Gi) provides persistent storage for git repositories that agents work with.

Gateway's workspace_list tool clones repos on demand (git clone) or updates them (git pull)
Gateway's workspace_read tool reads files from cloned repos
Gateway's file_glob / file_read tools operate on the workspace
Security: WORKSPACE_DENY_REPOS env var blocks access to sensitive repos (e.g., the infra repo itself)

8. Observability Stack

Current: Langfuse

Langfuse provides LLM-specific observability:

Tracing: Every LLM call traced with input/output, model, tokens, latency, cost
Gateway integration: OpenTelemetry span processor auto-instruments all generateText calls
Runtime integration: Langfuse API proxied at GET /langfuse/* for dashboard access
MCP integration: Langfuse MCP sidecar enables agents to query their own traces

Future: Full Stack

The design calls for a comprehensive observability stack:

Component	Role	Status
Langfuse	LLM-specific tracing and analytics	Deployed
Prometheus	Infrastructure metrics (CPU, memory, network)	Not yet deployed
Grafana	Dashboards and visualization	Not yet deployed
Loki	Log aggregation	Not yet deployed
OpenTelemetry	Distributed tracing	Partially (Langfuse span processor only)

9. Networking

Internal (Cluster)

Services communicate via K8s ClusterIP services on fixed ports (4100, 4200, 4300)
No ingress controller currently — services are accessed via Tailscale or port-forward
No network policies currently — all pods can reach all other pods within the namespace

External (Tailscale)

Tailscale sidecar on each pod provides VPN connectivity
OpenClaw node connects to the cluster via Tailscale
Dashboard and development access via Tailscale DNS names
No public internet exposure

10. Deployment Modes (Design)

The architecture supports two deployment modes from a single codebase:

	Agency Model (SaaS)	Customer VPC
Who hosts	Speedrun Ventures	Client's cloud account
Data residency	Speedrun's infrastructure	Client's infrastructure
Tenant isolation	Namespace (shared cell) or dedicated cluster	Full cluster
Current status	Active (single cell)	Not yet deployed

Cell architecture: Each deployment is a self-contained cell. Cells are identical in structure, different in configuration. A cell can operate independently.

Cloud-agnostic strategy: All infrastructure uses portable open-source components. No managed cloud services (no RDS, no Cloud SQL, no SQS). Once a K8s cluster exists, everything above it is identical regardless of cloud provider.

Concern	Choice	Rationale
Compute	Kubernetes	Universal
Database	PostgreSQL (CloudNativePG)	K8s-native operator
Secrets	HashiCorp Vault	Cloud-agnostic
Messaging	NATS (future)	Lightweight, portable
Observability	Langfuse + OTel + Prometheus	Open-source
GitOps	ArgoCD	K8s-native CD

11. Scaling Considerations

Current state (Stage 0): Single cell, 1 vertical, ~6 skills, 1 Postgres instance. Well within single-node capacity.

Next bottleneck (Stage 1, 5-10 clients):

Postgres connection pooling (PgBouncer) needed when agent count grows
LLM provider rate limits become relevant — key pooling across multiple keys
Observation storage grows — table partitioning needed

Architecture allows horizontal scaling of:

Runtime pods (stateless — scale with HPA)
Gateway pods (stateless — scale with HPA)
Knowledge pods (stateless — DB handles persistence)

What requires vertical scaling or sharding:

PostgreSQL (read replicas at Stage 2, per-component split at Stage 3)
pgvector indexes (evaluate Qdrant for hot-path at Stage 2)

See Non-Functional Assessment for detailed scalability model.

Infrastructure & Deployment ​

1. Current Deployment Topology ​

2. Sidecars & Init Containers ​

git-sync (agent definitions) ​

langfuse-mcp (MCP proxy) ​

tailscale (VPN mesh) ​

3. CI/CD Pipeline ​

4. GitOps Repository Structure ​

5. Secrets Management ​

Architecture ​

6. OpenClaw Integration ​

Plugin Architecture ​

Deployment ​

7. Workspace Management ​

8. Observability Stack ​

Current: Langfuse ​

Future: Full Stack ​

9. Networking ​

Internal (Cluster) ​

External (Tailscale) ​

10. Deployment Modes (Design) ​

11. Scaling Considerations ​