Threat Model & Security Assessment
Research for Project Kaze
1. System Boundaries
Kaze operates across multiple trust boundaries depending on deployment mode:
TRUST BOUNDARY: Internet
┌───────────────────────────────────────────────────────────┐
│ │
│ TRUST BOUNDARY: Speedrun Infrastructure │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Agency SaaS (multi-tenant) │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Tenant A │ │ Tenant B │ │ Tenant C │ │ │
│ │ │ (cell) │ │ (cell) │ │ (cell) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ │ Shared: LLM Gateway, Agent Runtime Pool │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ TRUST BOUNDARY: Customer VPC │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Full Kaze stack (single-tenant cell) │ │
│ │ Client data never leaves this boundary │ │
│ │ Speedrun access only via VPN (on-demand) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ TRUST BOUNDARY: LLM Providers │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Anthropic · OpenAI · Google · Ollama (local) │ │
│ │ Data sent for inference — no opt-out of processing │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ TRUST BOUNDARY: External Tools │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ SEMrush · GitHub · Google Calendar · Toddle DB │ │
│ │ Client data may flow to these services │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ TRUST BOUNDARY: OpenClaw │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Channels: Slack · WhatsApp · Telegram · Discord │ │
│ │ User messages flow through channel providers │ │
│ └─────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘2. Threat Actors
| Actor | Capability | Motivation | Likelihood |
|---|---|---|---|
| External attacker | Network access, phishing, credential theft | Data theft, ransom, disruption | Medium |
| Malicious tenant (agency model) | Authenticated access to own cell, standard API access | Access other tenants' data, exceed resource quotas, abuse shared infrastructure | Medium |
| Compromised LLM provider | Access to all prompts/responses sent for inference | Data harvesting, model poisoning | Low |
| Compromised external tool | Access to data sent via API integrations | Credential theft, data exfiltration | Low-Medium |
| Malicious or buggy agent | Tool access, knowledge write access, LLM calls | Data exfiltration, knowledge poisoning, resource exhaustion | Medium |
| Insider (Speedrun operator) | Infrastructure access, Vault access, deployment privileges | Data theft, unauthorized access | Low |
| Supply chain attacker | Compromised dependency, container image, or plugin | Backdoor, data theft | Low-Medium |
3. Attack Surfaces
3.1 Prompt Injection
Threat: Agents process user input, knowledge graph content, and tool outputs — all potential injection vectors. A crafted input could manipulate agent behavior.
Attack paths:
- User sends malicious message via channel → agent follows injected instructions
- Poisoned knowledge entry retrieved during agent reasoning → agent acts on false knowledge
- External tool returns adversarial content → agent processes as trusted data
Current state: Not addressed.
Mitigations needed:
- Input sanitization layer before agent processing
- Separate "trusted" (system prompt, skill definitions) from "untrusted" (user messages, tool outputs, knowledge) in context
- Instruction hierarchy: system prompt > skill definition > knowledge > user input
- Output validation for sensitive operations (financial, data deletion)
3.2 Tenant Isolation (Agency Model)
Threat: In shared-cell deployments, one tenant's agent accesses another tenant's data, keys, or knowledge.
Attack paths:
- Namespace escape in shared K8s cluster
- Agent manipulates knowledge query to bypass tenant scoping
- Shared LLM Gateway leaks context between requests
- Shared Agent Runtime pool bleeds state between tenant tasks
Current state: Partially addressed — namespace isolation documented, stateful components isolated per-tenant.
Mitigations needed:
- Tenant ID enforced at database query layer (every query includes tenant_id filter, not just application logic)
- LLM Gateway must flush all state between requests from different tenants
- Memory isolation: agent private state wiped between tenant context switches in shared runtime
- Network policies per namespace verified and tested
3.3 Knowledge Poisoning
Threat: An agent writes false or manipulated knowledge into the shared knowledge graph, which propagates to other agents and tenants.
Attack paths:
- Agent hallucinates confidently → writes to shared tier → quality gate passes (LLM-as-judge fooled)
- Adversarial user provides false information → agent stores it → becomes "fact" in knowledge system
- Compromised agent deliberately seeds misinformation
Current state: Quality gate mentioned (LLM-as-judge) but no defense-in-depth.
Mitigations needed:
- Multi-signal quality gate (not LLM-only): source verification, cross-reference with existing knowledge, confidence scoring
- Rate limiting on shared tier writes (flag agents that write excessively)
- Provenance chain: every shared knowledge entry traces back to originating observation, not just agent
- Quarantine period for shared knowledge (visible after N hours or human review, not immediately)
- Rollback capability: if poisoned knowledge is detected, trace and revert all downstream effects
3.4 LLM Data Exposure
Threat: Sensitive client data sent to LLM providers for inference becomes accessible to the provider or is used for training.
Attack paths:
- Agent sends client PII/financial data in prompts
- Provider uses data for model training (opt-out policies vary)
- Provider breach exposes inference logs
- Prompt caching across tenants at provider level
Current state: Data classification exists for VPC observability but not for LLM call content.
Mitigations needed:
- Data classification tags on knowledge entries: "safe for LLM" vs "internal only"
- PII detection before LLM calls — strip or redact sensitive fields
- Provider selection policy: sensitive data → only providers with zero-retention agreements
- Local model option (Ollama) for highest-sensitivity operations
- Audit trail of what data was sent to which provider
3.5 Credential Theft & Key Compromise
Threat: LLM API keys, tool credentials, or Vault access tokens are stolen.
Attack paths:
- Vault compromise → all keys exposed
- Agent logs credentials in observation logger (accidental)
- LLM provider key in prompts/responses (accidental leakage)
- CI/CD pipeline exposes secrets
- Client key leaked → attacker uses it via Kaze or directly
Current state: Vault documented, tenant-scoped access policies mentioned.
Mitigations needed:
- Key rotation policy: automated rotation on schedule, immediate rotation on suspected compromise
- Secret scanning in observation logs (detect and redact credentials before storage)
- Short-lived credentials where possible (OAuth tokens vs long-lived API keys)
- Key usage anomaly detection (sudden spike in usage from a key → alert + auto-freeze)
- Blast radius containment: if one client key is compromised, only that client's agents affected
3.6 Agent Privilege Escalation
Threat: An agent accesses tools, knowledge, or actions beyond its intended scope.
Attack paths:
- Agent discovers tools outside its vertical via registry
- Agent manipulates its own supervision level (writes to supervision_ramp_stats)
- Agent spawns subagents with elevated privileges
- Agent writes procedural knowledge that changes other agents' behavior
Current state: Tool filtering by vertical mentioned, supervision ramp documented.
Mitigations needed:
- Agent capability manifest (whitelist of tools + knowledge domains per agent, enforced at runtime)
- Supervision state is read-only to agents — only the platform can promote/demote
- Subagent privilege inheritance: child agents cannot exceed parent's capability set
- Knowledge write scopes enforced at system level, not agent self-declaration
3.7 Resource Exhaustion
Threat: Agent or tenant consumes disproportionate resources, affecting other tenants or platform stability.
Attack paths:
- Agent enters infinite tool-calling loop → burns token budget
- Agent spawns unlimited subagents → exhausts compute
- Knowledge system flooded with writes → storage exhaustion
- LLM Gateway request queue saturated by one tenant
Current state: Budget enforcement documented (hard stops), task timeouts mentioned.
Mitigations needed:
- Per-agent resource quotas: max concurrent tasks, max tool calls per task, max knowledge writes per hour
- Per-tenant compute quotas: max agents, max total tokens/day
- Circuit breaker on tool call loops (detect repeated identical calls, break after N)
- Queue fairness: per-tenant request queuing in LLM Gateway (no single tenant can starve others)
3.8 Supply Chain Attacks
Threat: Compromised dependency, container image, or OpenClaw plugin introduces backdoor.
Attack paths:
- Malicious npm package in dependency tree
- Compromised base image in container build
- OpenClaw plugin with malicious hook (40+ plugins, varying provenance)
- GitOps pipeline compromised → malicious update pushed to customer VPC
Current state: Signed images and reproducible builds mentioned for customer VPC.
Mitigations needed:
- Dependency scanning in CI (Snyk, Trivy, or similar)
- Container image scanning before deployment
- Pin all dependency versions, review updates manually
- OpenClaw plugin audit: only use vetted plugins, lock versions
- GitOps: signed commits required, approval gate before deployment to any environment
3.9 Data Exfiltration via Agents
Threat: An agent (compromised or manipulated) sends client data to unauthorized external endpoints.
Attack paths:
- Agent calls external tool with client data embedded in parameters
- Agent generates output containing client data → sent via channel to unauthorized recipient
- Agent writes client data to shared knowledge tier → visible to other tenants
Current state: Not addressed beyond "client knowledge never leaves cell."
Mitigations needed:
- Egress filtering: agents can only reach whitelisted external endpoints (per tenant, per vertical)
- Tool output scanning: detect client data in tool call parameters before sending
- Channel output review: for supervised agents, outbound messages go through approval
- Network policies enforcing egress restrictions at K8s level
3.10 Insider Threat
Threat: Speedrun team member with infrastructure access abuses their position.
Attack paths:
- Direct database access to read client data
- Vault access to read client API keys
- VPN into customer VPC for unauthorized data access
- Modify agent behavior to exfiltrate data
Current state: "Speedrun operators have no access to client secrets in customer VPC" mentioned but no enforcement.
Mitigations needed:
- Principle of least privilege for all Speedrun team access
- VPN sessions logged, time-limited (4hr max), require ticket justification
- Vault audit logging: every secret read is logged with operator identity
- Database access via bastion host with session recording
- Separation of duties: no single person can deploy + access production secrets
4. Security Properties Required
4.1 Confidentiality
| Property | Scope | Priority |
|---|---|---|
| Client data isolated between tenants | Agency model | Critical |
| Client data never leaves customer VPC | VPC deployments | Critical |
| LLM API keys not exposed to agents | All | Critical |
| PII not sent to LLM providers unless consented | All | High |
| Operator access to client data is audited and justified | All | High |
| Shared knowledge contains no client-specific data | Agency model | High |
4.2 Integrity
| Property | Scope | Priority |
|---|---|---|
| Knowledge graph entries are provenance-tracked and tamper-evident | All | Critical |
| Agent behavior cannot be self-modified without canary + rollback | All | Critical |
| Supervision levels cannot be escalated by agents themselves | All | Critical |
| Audit logs are immutable (append-only, no modifications) | All | High |
| Deployment artifacts are signed and verified | Customer VPC | High |
4.3 Availability
| Property | Scope | Priority |
|---|---|---|
| Single tenant failure does not affect other tenants | Agency model | Critical |
| Budget exhaustion stops the agent, not the platform | All | Critical |
| External tool failure is isolated (retry, fallback, not cascade) | All | High |
| LLM provider outage triggers fallback, not system failure | All | High |
4.4 Non-Repudiation
| Property | Scope | Priority |
|---|---|---|
| Every agent action is attributed to a specific agent + tenant + task | All | Critical |
| Every knowledge write has author identity and timestamp | All | Critical |
| Every LLM call is logged with provider, model, tokens, cost | All | High |
| Every supervision decision (approve/reject/modify) is recorded | All | High |
5. MVP Security Scope
What must be in place for MVP vs what can be deferred:
MVP (must have)
- Tenant ID enforcement at database query layer
- LLM API keys in Vault, never in agent code or logs
- Per-agent, per-tenant token budget with hard stops
- Agent capability manifest (whitelist of tools + knowledge per agent)
- Basic secret scanning in observation logs
- Task timeouts and tool call loop detection
- Audit trail for all agent actions (observation logger)
Phase 2 (important but not blocking)
- PII detection before LLM calls
- Egress filtering per tenant
- Key rotation policy
- OpenClaw plugin security audit
- Dependency scanning in CI
- Network policies per namespace (K8s)
- Data classification tags on knowledge entries
Phase 3+ (compliance & scale)
- SOC 2 / ISO 27001 readiness
- Formal incident response procedure
- Penetration testing program
- Signed GitOps deployments
- Session recording for infrastructure access
- Customer-facing security documentation
6. Open Questions
| # | Question | Impact |
|---|---|---|
| S1 | What authentication system for users/tenants? (OAuth2/OIDC, API keys, SSO?) | High — foundational |
| S2 | What's the minimum LLM provider data retention agreement acceptable? | High — client trust |
| S3 | How do we handle PII detection at scale? (regex patterns, dedicated model, third-party?) | Medium — compliance |
| S4 | Should shared knowledge have a quarantine period before becoming visible? | Medium — knowledge integrity |
| S5 | What's the incident response playbook if a cell is compromised? | Medium — operational readiness |