Skip to content

Non-Functional Assessment

Part of Project Kaze Architecture

Detailed models: cost-model.md · scalability-model.md


1. Security Posture

1.1 Zero-Secret Runtime

The runtime holds no API keys, no LLM credentials, no GitHub tokens. All secrets live in the gateway, injected via closures at tool registration time. If the runtime is compromised, the attacker cannot call LLM providers or external APIs directly.

┌─────────────┐     ┌──────────────────┐     ┌──────────────┐
│  Runtime    │────▶│  Gateway         │────▶│  Providers   │
│  (no keys) │     │  (holds secrets)  │     │              │
│             │     │  injects via     │     │              │
│             │     │  closures        │     │              │
└─────────────┘     └──────────────────┘     └──────────────┘

1.2 Secrets Management

  • Vault is the source of truth. K8s Secrets are derived copies via ExternalSecrets Operator.
  • Kubernetes auth — pods authenticate to Vault using service accounts. No static tokens.
  • 1-minute refresh — secret rotation propagates automatically.
  • Separation: Each service has its own Vault path. Runtime cannot access gateway's secrets.

1.3 Capability Enforcement

Agents declare capabilities in vertical.yaml. The runtime enforces:

  • Only whitelisted tools can be invoked per vertical
  • Subagents inherit at most the parent's capability set
  • Supervision state is read-only to agents

1.4 Credential Injection Pattern

typescript
// Gateway registers tool with credentials bound via closure
registerTool("github_api", (input) => {
  // GITHUB_TOKEN is captured in closure scope at registration
  // Agent never sees the token value
  return callGitHub(input, GITHUB_TOKEN);
});

1.5 Current Security Gaps

GapRiskMitigation path
No network policies in K8sPods can reach any other podAdd NetworkPolicy per namespace
No PII detection before LLM callsClient data sent to providersAdd PII scanner in gateway
No egress filteringAgents could reach arbitrary URLsK8s egress policies + tool URL whitelist
Budget not enforcedToken spend not cappedAdd budget tracking in gateway
Supervision stats in-memoryReset on pod restartPersist to database
No input sanitizationPrompt injection riskInstruction hierarchy + sanitization layer

2. Threat Model Summary

2.1 Trust Boundaries

                    TRUST BOUNDARY: Internet
┌───────────────────────────────────────────────────────────┐
│                                                           │
│   Speedrun Infrastructure (K8s cluster)                   │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Runtime · Gateway · Knowledge · Langfuse · Vault    │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│   LLM Providers (Anthropic, Google)                       │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Data sent for inference — provider retention varies │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│   External Tools (GitHub, future: SEMrush, Calendar)      │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Client data may flow to these services              │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│   OpenClaw Channels (Slack, WhatsApp, Telegram)           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  User messages flow through channel providers        │ │
│  └─────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘

2.2 Key Threats

#ThreatSeverityCurrent status
T1Prompt injection — crafted input manipulates agent behaviorHighNot mitigated. Instruction hierarchy designed but not enforced.
T2Tenant isolation — one tenant accesses another's dataCriticalN/A — single tenant (Speedrun only) currently. Designed for namespace isolation.
T3Knowledge poisoning — false data enters knowledge storeMediumPartially mitigated — Mem0 fact extraction filters noise. No quality gates on shared tier.
T4LLM data exposure — sensitive data sent to providersMediumNot mitigated. No PII detection, no data classification tags.
T5Credential theft — API keys stolenHighMitigated — Vault + zero-secret runtime + credential injection closures.
T6Agent privilege escalation — agent exceeds intended scopeMediumPartially mitigated — capability whitelist in vertical.yaml. No per-task quotas.
T7Resource exhaustion — runaway agent burns resourcesMediumPartially mitigated — maxSteps limits agentic loops. No budget enforcement.
T8Supply chain attack — compromised dependencyLow-MediumPartially mitigated — GHCR images, pinned deps. No dependency scanning in CI.
T9Data exfiltration — agent sends data to unauthorized endpointsMediumNot mitigated. No egress filtering.
T10Insider threat — operator abuses accessLowPartially mitigated — Vault audit logging. No session recording, no bastion.

2.3 MVP Security Priorities

Must have:

  • [x] LLM API keys in Vault, never in agent code or logs
  • [x] Zero-secret runtime (credential injection via gateway)
  • [x] Agent capability manifest (tool whitelist per vertical)
  • [x] Audit trail via Langfuse (all LLM calls traced)
  • [ ] Task timeouts and tool call loop detection (maxSteps exists, no hard timeout)

Next priority:

  • Network policies per namespace
  • PII detection before LLM calls
  • Budget enforcement (hard stops)
  • Egress filtering per tenant
  • Dependency scanning in CI

Full threat model details in the original threat-model.md.


3. Cost Model

3.1 Cost Structure

┌────────────────────────────────────────────────────────┐
│  VARIABLE COSTS (scale with usage)                      │
│                                                         │
│  ████████████████████████████████  LLM Tokens (60-80%)  │
│  ████████                         External APIs (10-15%)│
│  ████                             Embedding Gen (3-5%)  │
│                                                         │
│  SEMI-FIXED COSTS (scale with tenants)                  │
│                                                         │
│  ██████████████                   Compute / K8s         │
│  ████████                         Database              │
│                                                         │
│  FIXED COSTS (exist regardless)                         │
│                                                         │
│  ██████                           Control plane         │
│  ████                             CI/CD + Registry      │
│  ████                             Monitoring            │
│  ██                               Vault                 │
└────────────────────────────────────────────────────────┘

Key insight: LLM token cost dominates. A 20% reduction in tokens per task saves more than halving infrastructure costs. Cost optimization should focus on LLM efficiency.

3.2 Cost per Task

Task TypeFast (Haiku)Balanced (Sonnet)Best (Opus)Cheapest (Gemini Flash-Lite)
Simple extraction$0.004$0.011$0.018$0.0004
Keyword research$0.014$0.042$0.070$0.002
Content optimization$0.021$0.063$0.105$0.002
Research synthesis$0.028$0.084$0.140$0.003
Technical audit$0.035$0.105$0.175$0.004

Even complex tasks cost under $0.20 with the most expensive model. The V0 Internal Ops vertical uses fast (Haiku) by default — most tasks cost $0.01-0.04.

3.3 Cost Optimization Levers

LeverSavingsImplementation
Model routing5-10xUse fast for simple tasks, balanced only when needed, best rarely
Prompt caching (Anthropic)90% on cached portionStable system prompts cached across calls
Batch API (Anthropic/OpenAI)50%Non-urgent tasks (quality evaluation, knowledge consolidation)
Gemini Flash-Lite10-50x vs OpusBulk processing, classification, data extraction
Knowledge context pruning20-30% token reductionOnly inject relevant memories, not all matches

3.4 Infrastructure Costs (Current Stage)

ComponentMonthly est.Notes
K8s cluster (3 nodes)~$150-300Depends on provider/instance type
PostgreSQL + pgvector~$50-100Small instance, single node
Langfuse~$0-50Self-hosted or free tier
Vault~$0Runs as pod in cluster
Container registry (GHCR)~$0Free for public/org repos
Total infra~$200-450/mo

Variable costs (LLM) at current usage (V0 Internal Ops, ~50-100 tasks/day):

  • Using fast (Haiku): ~$15-60/mo
  • Using balanced (Sonnet): ~$50-200/mo

4. Scalability Assessment

4.1 Scale Stages

StageScaleArchitectureFirst bottleneck
0: MVP (current)1 cell, ~15 agents, 1 PostgresAll in one namespace, direct HTTP callsNone expected
1: Early clients5-10 clients, ~50 agentsShared cells, namespace isolationLLM rate limits, Postgres connections
2: Growth20-50 clients, ~200 agentsMixed shared/dedicated cells, read replicas, NATSpgvector latency, write contention
3: Scale100+ clients, ~1000+ agentsMulti-region, sharded DBs, NATS clustersOperational complexity

4.2 Component Bottleneck Analysis

Agent Runtime:

  • ~50-100 agents per 8GB node (est. 50-150MB per agent with loaded context)
  • Almost never the bottleneck — agents spend most time waiting on LLM calls
  • Scales horizontally with HPA

LLM Gateway:

  • Hard ceiling: LLM provider rate limits (Anthropic ~4M tokens/min Tier 4)
  • Mitigation: multi-key pooling, multi-provider fallback, request queuing
  • Gateway itself is stateless, scales horizontally

Knowledge Service:

  • Bottleneck: pgvector query latency as index grows
  • At Stage 2 (~200 agents, millions of vectors): evaluate Qdrant for hot-path queries
  • Embedding generation is batched (100/batch) — throughput adequate for current scale

PostgreSQL:

  • Stage 0-1: Single instance sufficient
  • Stage 2: Read replicas for knowledge queries and observation reads
  • Stage 3: Per-component database split (knowledge, observations, agent state)

4.3 Horizontal vs Vertical Scaling

ComponentScales horizontallyScales verticallyNotes
RuntimeYes (stateless pods)N/AHPA on CPU/memory
GatewayYes (stateless pods)N/AHPA on request count
KnowledgeYes (stateless pods)Database growsDB is the bottleneck, not the service
PostgreSQLRead replicasBigger instanceWrite master is vertical until sharding
pgvectorN/AIndex optimizationConsider Qdrant at scale

4.4 Decision Triggers

TriggerWhenAction
Postgres connections > 100Stage 1 (~50 agents)Add PgBouncer
LLM rate limit errors > 1%Stage 1 (high throughput)Multi-key pooling
Observation table > 100M rowsStage 1-2Partition by month + tenant
pgvector query p99 > 500msStage 2 (~1M vectors)Evaluate Qdrant
Inter-agent calls cross nodesStage 2Introduce NATS
Write contention on hot tablesStage 2Read replicas, batch writes

5. Reliability

5.1 Current State

AspectStatusNotes
RedundancySingle instance per serviceNo HA — acceptable for MVP
BackupsManualPostgreSQL backup not automated
Health checksBasic HTTP livenessNo readiness probes beyond startup
RecoveryK8s restart policyPod restart on crash, no data recovery automation
MonitoringLangfuse (LLM only)No infrastructure monitoring (Prometheus/Grafana)

5.2 Error Handling (Implemented)

ComponentFailure modeRecovery
RuntimeAgent crash during taskMark task failed, log error. 3 consecutive → demote supervision.
RuntimeGateway unreachableTask fails with connection error. No retry.
GatewayLLM provider errorReturns error to runtime. No automatic fallback between providers yet.
GatewayTool execution errorReturns error payload. Agent can reason about it and retry.
KnowledgeDatabase unavailableService returns 500. Runtime proceeds without memory context.

5.3 What's Needed for Production

  • [ ] Health/readiness probes on all services
  • [ ] Automated PostgreSQL backups (daily minimum)
  • [ ] Multi-replica deployments (at least 2 per service)
  • [ ] Infrastructure monitoring (Prometheus + Grafana)
  • [ ] Alerting (PagerDuty/Slack integration)
  • [ ] Provider fallback in gateway (Anthropic → Google → etc.)
  • [ ] Graceful degradation when knowledge service is down

6. Performance Characteristics

6.1 Latency Profile

OperationTypical latencyBottleneck
Task dispatch (sync, simple)2-10sLLM response time
Task dispatch (agentic, 3-6 steps)10-60sMultiple LLM calls + tool execution
Knowledge search50-200mspgvector similarity search
Knowledge add1-3sLLM fact extraction + embedding
Knowledge add-raw200-500msEmbedding only
Knowledge add-raw-batch (100 items)5-15sBatch embedding + insert
Tool execution (GitHub API)200-500msGitHub API response time
Tool execution (Docling)5-30sDocument conversion complexity

6.2 Throughput

At current scale, throughput is not a concern. The system is designed for quality of agent reasoning, not high-throughput processing. Key constraints:

  • LLM concurrency: Limited by provider rate limits, not system architecture
  • Knowledge writes: Batch operations handle bulk ingestion efficiently
  • Tool calls: Rate-limited by external API quotas, not internal capacity

7. Compliance & Governance Readiness

Current State

RequirementStatus
Audit trail (who did what, when)Partial — Langfuse traces LLM calls. No structured agent action log.
Data residency controlsNot applicable — single deployment. Architecture supports VPC mode.
Access control (RBAC/ABAC)Not implemented. Single-tenant, single-user.
Data retention policiesNot implemented. Knowledge persists indefinitely.
Incident responseNot documented.

Path to Compliance

MilestoneEffortWhen needed
SOC 2 Type I readinessHighBefore enterprise clients
GDPR data subject rightsMediumBefore EU clients
ISO 27001 certificationHighMarket differentiation
Formal incident response playbookLowBefore multi-tenant production