Skip to content

Infrastructure & Deployment

Part of Project Kaze Architecture


1. Current Deployment Topology

All Kaze services run on a single Kubernetes cluster with a GitOps deployment model:

┌─ Kubernetes Cluster ──────────────────────────────────────────────┐
│                                                                    │
│  ┌─ kaze-runtime pod ─────────────────────────────────────────┐   │
│  │                                                             │   │
│  │  ┌───────────────┐  ┌──────────────────┐  ┌─────────────┐  │   │
│  │  │ kaze-runtime  │  │ git-sync         │  │ langfuse-mcp│  │   │
│  │  │ (port 4100)   │  │ (init+sidecar)   │  │ (port 4101) │  │   │
│  │  │               │  │ clones agent repo│  │ SSE proxy   │  │   │
│  │  └───────────────┘  └──────────────────┘  └─────────────┘  │   │
│  │                                                             │   │
│  │  ┌─────────────┐  Shared volume: /agents/ (agent YAML defs)│   │
│  │  │ tailscale   │  Shared volume: /workspace/ (5Gi PVC)     │   │
│  │  │ (sidecar)   │                                           │   │
│  │  └─────────────┘                                           │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ kaze-gateway pod ─────────────────────────────────────────┐   │
│  │  ┌───────────────┐  ┌─────────────┐                        │   │
│  │  │ kaze-gateway  │  │ tailscale   │                        │   │
│  │  │ (port 4200)   │  │ (sidecar)   │                        │   │
│  │  └───────────────┘  └─────────────┘                        │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ kaze-knowledge pod ───────────────────────────────────────┐   │
│  │  ┌───────────────┐  ┌─────────────┐                        │   │
│  │  │ kaze-knowledge│  │ tailscale   │                        │   │
│  │  │ (port 4300)   │  │ (sidecar)   │                        │   │
│  │  └───────────────┘  └─────────────┘                        │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ PostgreSQL + pgvector ────────────────────────────────────┐   │
│  │  Knowledge vectors · Mem0 metadata                         │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ Langfuse ────────────────────────────────────────────────┐    │
│  │  Observability · Tracing · LLM call analytics              │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ Vault (ExternalSecrets) ─────────────────────────────────┐    │
│  │  LLM API keys · GitHub token · Service credentials         │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  ┌─ OpenClaw node (external, Tailscale-connected) ───────────┐   │
│  │  OpenClaw gateway · kaze-runtime plugin                    │   │
│  │  Channels: Slack, WhatsApp, Telegram                       │   │
│  └────────────────────────────────────────────────────────────┘   │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

2. Sidecars & Init Containers

git-sync (agent definitions)

Clones the kaze-agent-ops repo into a shared volume so the runtime can load vertical and skill YAML definitions without baking them into the container image.

  • Init container: Clones the repo before the runtime starts
  • Sidecar: Pulls every 60 seconds for live updates
  • Mount: Read-only at /agents/kaze-agent-ops/
  • Benefit: Skill changes deploy via git push + pod restart, not image rebuild

langfuse-mcp (MCP proxy)

Runs the Langfuse MCP server as a sidecar, exposing Langfuse data over the Model Context Protocol via SSE.

  • Port: 4101 (internal)
  • Proxied by runtime: GET /mcp/*http://localhost:4101/mcp/*
  • Use case: Enables LLM agents to query Langfuse observability data via MCP tools

tailscale (VPN mesh)

Provides secure networking between the K8s cluster and external nodes (OpenClaw).

  • Runs as sidecar on each pod that needs VPN access
  • Enables: Runtime ↔ OpenClaw communication, dashboard access, inter-cluster connectivity
  • Auth: Tailscale auth keys stored in Vault, injected via ExternalSecrets

3. CI/CD Pipeline

All repos follow the same pipeline: GitHub Actions → Container Image → GitOps → Kubernetes.

Developer pushes to main


┌─ GitHub Actions ────────────────────────────────┐
│  1. Checkout code                                │
│  2. Docker buildx (multi-arch: amd64 + arm64)   │
│  3. Push image to GHCR (ghcr.io/speedrun-...)   │
│  4. Update GitOps repo:                          │
│     cd gitops/cluster-dev/<service>/             │
│     kustomize edit set image <new-tag>           │
│     git commit + push                            │
└──────────────────────────────────────────────────┘


┌─ GitOps Repo (cluster-dev) ─────────────────────┐
│  ArgoCD detects commit                           │
│  Syncs Kustomize overlays to cluster             │
│  Rolling update of affected deployments          │
└──────────────────────────────────────────────────┘

Key properties:

  • No manual kubectl. All cluster state is defined in git. ArgoCD is the only entity that applies manifests.
  • Multi-arch builds. Images target both amd64 and arm64 (dev machines use Apple Silicon).
  • Image tags are git SHA or semver. Kustomize overlays reference specific tags — no latest.
  • Agent definitions (kaze-agent-ops) don't need image builds. The git-sync sidecar pulls changes. Pod restart picks up new skills.

4. GitOps Repository Structure

gitops/cluster-dev/
├── base/
│   └── vault-secret-store.yaml       # ClusterSecretStore for all services
├── kaze-runtime/
│   ├── kustomization.yaml
│   ├── deployment.yaml               # Runtime + git-sync + langfuse-mcp + tailscale
│   ├── service.yaml
│   ├── external-secret.yaml          # Pulls from Vault
│   └── workspace-pvc.yaml            # 5Gi shared workspace
├── kaze-gateway/
│   ├── kustomization.yaml
│   ├── deployment.yaml               # Gateway + tailscale
│   ├── service.yaml
│   └── external-secret.yaml
├── kaze-knowledge/
│   ├── kustomization.yaml
│   ├── deployment.yaml               # Knowledge + tailscale
│   ├── service.yaml
│   └── external-secret.yaml
└── ...

Each service subdirectory is self-contained with its own Kustomization. Overlays can be added per environment (staging, production, customer VPC).


5. Secrets Management

Architecture

┌─ HashiCorp Vault ──────────────────────────────────────┐
│                                                         │
│  kv/kaze-runtime/                                       │
│    ├── KAZE_GATEWAY_URL                                 │
│    ├── KAZE_KNOWLEDGE_URL                               │
│    ├── LANGFUSE_SECRET_KEY                              │
│    ├── LANGFUSE_PUBLIC_KEY                              │
│    └── GIT_SYNC_TOKEN (for agent repo access)           │
│                                                         │
│  kv/kaze-gateway/                                       │
│    ├── ANTHROPIC_API_KEY                                │
│    ├── GOOGLE_GENERATIVE_AI_API_KEY                    │
│    ├── GITHUB_TOKEN                                     │
│    ├── LANGFUSE_SECRET_KEY                              │
│    └── LANGFUSE_PUBLIC_KEY                              │
│                                                         │
│  kv/kaze-knowledge/                                     │
│    ├── GOOGLE_API_KEY (embeddings)                      │
│    ├── DATABASE_URL                                     │
│    ├── LANGFUSE_SECRET_KEY                              │
│    └── LANGFUSE_PUBLIC_KEY                              │
│                                                         │
└─────────────────────────────────────────────────────────┘

         │ Kubernetes auth method

┌─ ExternalSecrets Operator ─────────────────────────────┐
│  ClusterSecretStore → Vault backend                     │
│  ExternalSecret per service → K8s Secret                │
│  Refresh interval: 1 minute                             │
│  Secret rotation: automatic on Vault update             │
└─────────────────────────────────────────────────────────┘


┌─ K8s Secrets ──────────────────────────────────────────┐
│  Mounted as env vars in pod containers                  │
│  Never checked into git                                 │
│  Never visible in container images                      │
└─────────────────────────────────────────────────────────┘

Key design:

  • Vault is the source of truth for all secrets. K8s Secrets are derived copies.
  • Kubernetes auth method — pods authenticate to Vault using their service account. No static Vault tokens.
  • 1-minute refresh — ExternalSecrets polls Vault every 60s. Secret rotation propagates within a minute.
  • Zero secrets in the runtime — the runtime has no LLM keys or GitHub tokens. It only knows gateway/knowledge URLs. The gateway holds all provider credentials.

6. OpenClaw Integration

OpenClaw runs on a separate node (not in K8s), connected to the cluster via Tailscale VPN.

Plugin Architecture

The Kaze plugin for OpenClaw is a thin TypeScript client that exposes three tools:

ToolDescription
kaze_dispatch_taskDispatch a task to a vertical/skill on the runtime
kaze_list_verticalsList available verticals and their skills
kaze_agent_statusCheck agent health and supervision status

Hooks:

  • Pre-tool hook: Before any tool call, searches kaze-knowledge for relevant context and injects it into the conversation
  • Post-message hook: After assistant messages, stores conversation summaries to kaze-knowledge
  • Langfuse tracing: OpenClaw sessions are traced in Langfuse for end-to-end observability

Deployment

The plugin is deployed via Ansible playbook:

  • Copies plugin files to the OpenClaw node
  • Configures OpenClaw settings to register the plugin
  • Sets environment variables for runtime URL and knowledge URL

7. Workspace Management

A shared PersistentVolumeClaim (workspace-pvc, 5Gi) provides persistent storage for git repositories that agents work with.

  • Gateway's workspace_list tool clones repos on demand (git clone) or updates them (git pull)
  • Gateway's workspace_read tool reads files from cloned repos
  • Gateway's file_glob / file_read tools operate on the workspace
  • Security: WORKSPACE_DENY_REPOS env var blocks access to sensitive repos (e.g., the infra repo itself)

8. Observability Stack

Current: Langfuse

Langfuse provides LLM-specific observability:

  • Tracing: Every LLM call traced with input/output, model, tokens, latency, cost
  • Gateway integration: OpenTelemetry span processor auto-instruments all generateText calls
  • Runtime integration: Langfuse API proxied at GET /langfuse/* for dashboard access
  • MCP integration: Langfuse MCP sidecar enables agents to query their own traces

Future: Full Stack

The design calls for a comprehensive observability stack:

ComponentRoleStatus
LangfuseLLM-specific tracing and analyticsDeployed
PrometheusInfrastructure metrics (CPU, memory, network)Not yet deployed
GrafanaDashboards and visualizationNot yet deployed
LokiLog aggregationNot yet deployed
OpenTelemetryDistributed tracingPartially (Langfuse span processor only)

9. Networking

Internal (Cluster)

  • Services communicate via K8s ClusterIP services on fixed ports (4100, 4200, 4300)
  • No ingress controller currently — services are accessed via Tailscale or port-forward
  • No network policies currently — all pods can reach all other pods within the namespace

External (Tailscale)

  • Tailscale sidecar on each pod provides VPN connectivity
  • OpenClaw node connects to the cluster via Tailscale
  • Dashboard and development access via Tailscale DNS names
  • No public internet exposure

10. Deployment Modes (Design)

The architecture supports two deployment modes from a single codebase:

Agency Model (SaaS)Customer VPC
Who hostsSpeedrun VenturesClient's cloud account
Data residencySpeedrun's infrastructureClient's infrastructure
Tenant isolationNamespace (shared cell) or dedicated clusterFull cluster
Current statusActive (single cell)Not yet deployed

Cell architecture: Each deployment is a self-contained cell. Cells are identical in structure, different in configuration. A cell can operate independently.

Cloud-agnostic strategy: All infrastructure uses portable open-source components. No managed cloud services (no RDS, no Cloud SQL, no SQS). Once a K8s cluster exists, everything above it is identical regardless of cloud provider.

ConcernChoiceRationale
ComputeKubernetesUniversal
DatabasePostgreSQL (CloudNativePG)K8s-native operator
SecretsHashiCorp VaultCloud-agnostic
MessagingNATS (future)Lightweight, portable
ObservabilityLangfuse + OTel + PrometheusOpen-source
GitOpsArgoCDK8s-native CD

11. Scaling Considerations

Current state (Stage 0): Single cell, 1 vertical, ~6 skills, 1 Postgres instance. Well within single-node capacity.

Next bottleneck (Stage 1, 5-10 clients):

  • Postgres connection pooling (PgBouncer) needed when agent count grows
  • LLM provider rate limits become relevant — key pooling across multiple keys
  • Observation storage grows — table partitioning needed

Architecture allows horizontal scaling of:

  • Runtime pods (stateless — scale with HPA)
  • Gateway pods (stateless — scale with HPA)
  • Knowledge pods (stateless — DB handles persistence)

What requires vertical scaling or sharding:

  • PostgreSQL (read replicas at Stage 2, per-component split at Stage 3)
  • pgvector indexes (evaluate Qdrant for hot-path at Stage 2)

See Non-Functional Assessment for detailed scalability model.