Threat Model & Security Assessment

Research for Project Kaze

1. System Boundaries

Kaze operates across multiple trust boundaries depending on deployment mode:

                    TRUST BOUNDARY: Internet
┌───────────────────────────────────────────────────────────┐
│                                                           │
│   TRUST BOUNDARY: Speedrun Infrastructure                 │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Agency SaaS (multi-tenant)                          │ │
│  │                                                      │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐          │ │
│  │  │ Tenant A │  │ Tenant B │  │ Tenant C │          │ │
│  │  │ (cell)   │  │ (cell)   │  │ (cell)   │          │ │
│  │  └──────────┘  └──────────┘  └──────────┘          │ │
│  │                                                      │ │
│  │  Shared: LLM Gateway, Agent Runtime Pool             │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│   TRUST BOUNDARY: Customer VPC                            │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Full Kaze stack (single-tenant cell)                │ │
│  │  Client data never leaves this boundary              │ │
│  │  Speedrun access only via VPN (on-demand)            │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│   TRUST BOUNDARY: LLM Providers                           │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Anthropic · OpenAI · Google · Ollama (local)        │ │
│  │  Data sent for inference — no opt-out of processing  │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│   TRUST BOUNDARY: External Tools                          │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  SEMrush · GitHub · Google Calendar · Toddle DB      │ │
│  │  Client data may flow to these services              │ │
│  └─────────────────────────────────────────────────────┘ │
│                                                           │
│   TRUST BOUNDARY: OpenClaw                                │
│  ┌─────────────────────────────────────────────────────┐ │
│  │  Channels: Slack · WhatsApp · Telegram · Discord     │ │
│  │  User messages flow through channel providers        │ │
│  └─────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────┘

2. Threat Actors

Actor	Capability	Motivation	Likelihood
External attacker	Network access, phishing, credential theft	Data theft, ransom, disruption	Medium
Malicious tenant (agency model)	Authenticated access to own cell, standard API access	Access other tenants' data, exceed resource quotas, abuse shared infrastructure	Medium
Compromised LLM provider	Access to all prompts/responses sent for inference	Data harvesting, model poisoning	Low
Compromised external tool	Access to data sent via API integrations	Credential theft, data exfiltration	Low-Medium
Malicious or buggy agent	Tool access, knowledge write access, LLM calls	Data exfiltration, knowledge poisoning, resource exhaustion	Medium
Insider (Speedrun operator)	Infrastructure access, Vault access, deployment privileges	Data theft, unauthorized access	Low
Supply chain attacker	Compromised dependency, container image, or plugin	Backdoor, data theft	Low-Medium

3. Attack Surfaces

3.1 Prompt Injection

Threat: Agents process user input, knowledge graph content, and tool outputs — all potential injection vectors. A crafted input could manipulate agent behavior.

Attack paths:

User sends malicious message via channel → agent follows injected instructions
Poisoned knowledge entry retrieved during agent reasoning → agent acts on false knowledge
External tool returns adversarial content → agent processes as trusted data

Current state: Not addressed.

Mitigations needed:

Input sanitization layer before agent processing
Separate "trusted" (system prompt, skill definitions) from "untrusted" (user messages, tool outputs, knowledge) in context
Instruction hierarchy: system prompt > skill definition > knowledge > user input
Output validation for sensitive operations (financial, data deletion)

3.2 Tenant Isolation (Agency Model)

Threat: In shared-cell deployments, one tenant's agent accesses another tenant's data, keys, or knowledge.

Attack paths:

Namespace escape in shared K8s cluster
Agent manipulates knowledge query to bypass tenant scoping
Shared LLM Gateway leaks context between requests
Shared Agent Runtime pool bleeds state between tenant tasks

Current state: Partially addressed — namespace isolation documented, stateful components isolated per-tenant.

Mitigations needed:

Tenant ID enforced at database query layer (every query includes tenant_id filter, not just application logic)
LLM Gateway must flush all state between requests from different tenants
Memory isolation: agent private state wiped between tenant context switches in shared runtime
Network policies per namespace verified and tested

3.3 Knowledge Poisoning

Threat: An agent writes false or manipulated knowledge into the shared knowledge graph, which propagates to other agents and tenants.

Attack paths:

Agent hallucinates confidently → writes to shared tier → quality gate passes (LLM-as-judge fooled)
Adversarial user provides false information → agent stores it → becomes "fact" in knowledge system
Compromised agent deliberately seeds misinformation

Current state: Quality gate mentioned (LLM-as-judge) but no defense-in-depth.

Mitigations needed:

Multi-signal quality gate (not LLM-only): source verification, cross-reference with existing knowledge, confidence scoring
Rate limiting on shared tier writes (flag agents that write excessively)
Provenance chain: every shared knowledge entry traces back to originating observation, not just agent
Quarantine period for shared knowledge (visible after N hours or human review, not immediately)
Rollback capability: if poisoned knowledge is detected, trace and revert all downstream effects

3.4 LLM Data Exposure

Threat: Sensitive client data sent to LLM providers for inference becomes accessible to the provider or is used for training.

Attack paths:

Agent sends client PII/financial data in prompts
Provider uses data for model training (opt-out policies vary)
Provider breach exposes inference logs
Prompt caching across tenants at provider level

Current state: Data classification exists for VPC observability but not for LLM call content.

Mitigations needed:

Data classification tags on knowledge entries: "safe for LLM" vs "internal only"
PII detection before LLM calls — strip or redact sensitive fields
Provider selection policy: sensitive data → only providers with zero-retention agreements
Local model option (Ollama) for highest-sensitivity operations
Audit trail of what data was sent to which provider

3.5 Credential Theft & Key Compromise

Threat: LLM API keys, tool credentials, or Vault access tokens are stolen.

Attack paths:

Vault compromise → all keys exposed
Agent logs credentials in observation logger (accidental)
LLM provider key in prompts/responses (accidental leakage)
CI/CD pipeline exposes secrets
Client key leaked → attacker uses it via Kaze or directly

Current state: Vault documented, tenant-scoped access policies mentioned.

Mitigations needed:

Key rotation policy: automated rotation on schedule, immediate rotation on suspected compromise
Secret scanning in observation logs (detect and redact credentials before storage)
Short-lived credentials where possible (OAuth tokens vs long-lived API keys)
Key usage anomaly detection (sudden spike in usage from a key → alert + auto-freeze)
Blast radius containment: if one client key is compromised, only that client's agents affected

3.6 Agent Privilege Escalation

Threat: An agent accesses tools, knowledge, or actions beyond its intended scope.

Attack paths:

Agent discovers tools outside its vertical via registry
Agent manipulates its own supervision level (writes to supervision_ramp_stats)
Agent spawns subagents with elevated privileges
Agent writes procedural knowledge that changes other agents' behavior

Current state: Tool filtering by vertical mentioned, supervision ramp documented.

Mitigations needed:

Agent capability manifest (whitelist of tools + knowledge domains per agent, enforced at runtime)
Supervision state is read-only to agents — only the platform can promote/demote
Subagent privilege inheritance: child agents cannot exceed parent's capability set
Knowledge write scopes enforced at system level, not agent self-declaration

3.7 Resource Exhaustion

Threat: Agent or tenant consumes disproportionate resources, affecting other tenants or platform stability.

Attack paths:

Agent enters infinite tool-calling loop → burns token budget
Agent spawns unlimited subagents → exhausts compute
Knowledge system flooded with writes → storage exhaustion
LLM Gateway request queue saturated by one tenant

Current state: Budget enforcement documented (hard stops), task timeouts mentioned.

Mitigations needed:

Per-agent resource quotas: max concurrent tasks, max tool calls per task, max knowledge writes per hour
Per-tenant compute quotas: max agents, max total tokens/day
Circuit breaker on tool call loops (detect repeated identical calls, break after N)
Queue fairness: per-tenant request queuing in LLM Gateway (no single tenant can starve others)

3.8 Supply Chain Attacks

Threat: Compromised dependency, container image, or OpenClaw plugin introduces backdoor.

Attack paths:

Malicious npm package in dependency tree
Compromised base image in container build
OpenClaw plugin with malicious hook (40+ plugins, varying provenance)
GitOps pipeline compromised → malicious update pushed to customer VPC

Current state: Signed images and reproducible builds mentioned for customer VPC.

Mitigations needed:

Dependency scanning in CI (Snyk, Trivy, or similar)
Container image scanning before deployment
Pin all dependency versions, review updates manually
OpenClaw plugin audit: only use vetted plugins, lock versions
GitOps: signed commits required, approval gate before deployment to any environment

3.9 Data Exfiltration via Agents

Threat: An agent (compromised or manipulated) sends client data to unauthorized external endpoints.

Attack paths:

Agent calls external tool with client data embedded in parameters
Agent generates output containing client data → sent via channel to unauthorized recipient
Agent writes client data to shared knowledge tier → visible to other tenants

Current state: Not addressed beyond "client knowledge never leaves cell."

Mitigations needed:

Egress filtering: agents can only reach whitelisted external endpoints (per tenant, per vertical)
Tool output scanning: detect client data in tool call parameters before sending
Channel output review: for supervised agents, outbound messages go through approval
Network policies enforcing egress restrictions at K8s level

3.10 Insider Threat

Threat: Speedrun team member with infrastructure access abuses their position.

Attack paths:

Direct database access to read client data
Vault access to read client API keys
VPN into customer VPC for unauthorized data access
Modify agent behavior to exfiltrate data

Current state: "Speedrun operators have no access to client secrets in customer VPC" mentioned but no enforcement.

Mitigations needed:

Principle of least privilege for all Speedrun team access
VPN sessions logged, time-limited (4hr max), require ticket justification
Vault audit logging: every secret read is logged with operator identity
Database access via bastion host with session recording
Separation of duties: no single person can deploy + access production secrets

4. Security Properties Required

4.1 Confidentiality

Property	Scope	Priority
Client data isolated between tenants	Agency model	Critical
Client data never leaves customer VPC	VPC deployments	Critical
LLM API keys not exposed to agents	All	Critical
PII not sent to LLM providers unless consented	All	High
Operator access to client data is audited and justified	All	High
Shared knowledge contains no client-specific data	Agency model	High

4.2 Integrity

Property	Scope	Priority
Knowledge graph entries are provenance-tracked and tamper-evident	All	Critical
Agent behavior cannot be self-modified without canary + rollback	All	Critical
Supervision levels cannot be escalated by agents themselves	All	Critical
Audit logs are immutable (append-only, no modifications)	All	High
Deployment artifacts are signed and verified	Customer VPC	High

4.3 Availability

Property	Scope	Priority
Single tenant failure does not affect other tenants	Agency model	Critical
Budget exhaustion stops the agent, not the platform	All	Critical
External tool failure is isolated (retry, fallback, not cascade)	All	High
LLM provider outage triggers fallback, not system failure	All	High

4.4 Non-Repudiation

Property	Scope	Priority
Every agent action is attributed to a specific agent + tenant + task	All	Critical
Every knowledge write has author identity and timestamp	All	Critical
Every LLM call is logged with provider, model, tokens, cost	All	High
Every supervision decision (approve/reject/modify) is recorded	All	High

5. MVP Security Scope

What must be in place for MVP vs what can be deferred:

MVP (must have)

Tenant ID enforcement at database query layer
LLM API keys in Vault, never in agent code or logs
Per-agent, per-tenant token budget with hard stops
Agent capability manifest (whitelist of tools + knowledge per agent)
Basic secret scanning in observation logs
Task timeouts and tool call loop detection
Audit trail for all agent actions (observation logger)

Phase 2 (important but not blocking)

PII detection before LLM calls
Egress filtering per tenant
Key rotation policy
OpenClaw plugin security audit
Dependency scanning in CI
Network policies per namespace (K8s)
Data classification tags on knowledge entries

Phase 3+ (compliance & scale)

SOC 2 / ISO 27001 readiness
Formal incident response procedure
Penetration testing program
Signed GitOps deployments
Session recording for infrastructure access
Customer-facing security documentation

6. Open Questions

#	Question	Impact
S1	What authentication system for users/tenants? (OAuth2/OIDC, API keys, SSO?)	High — foundational
S2	What's the minimum LLM provider data retention agreement acceptable?	High — client trust
S3	How do we handle PII detection at scale? (regex patterns, dedicated model, third-party?)	Medium — compliance
S4	Should shared knowledge have a quarantine period before becoming visible?	Medium — knowledge integrity
S5	What's the incident response playbook if a cell is compromised?	Medium — operational readiness

Threat Model & Security Assessment ​

1. System Boundaries ​

2. Threat Actors ​

3. Attack Surfaces ​

3.1 Prompt Injection ​

3.2 Tenant Isolation (Agency Model) ​

3.3 Knowledge Poisoning ​

3.4 LLM Data Exposure ​

3.5 Credential Theft & Key Compromise ​

3.6 Agent Privilege Escalation ​

3.7 Resource Exhaustion ​

3.8 Supply Chain Attacks ​

3.9 Data Exfiltration via Agents ​

3.10 Insider Threat ​

4. Security Properties Required ​

4.1 Confidentiality ​

4.2 Integrity ​

4.3 Availability ​

4.4 Non-Repudiation ​

5. MVP Security Scope ​

MVP (must have) ​

Phase 2 (important but not blocking) ​

Phase 3+ (compliance & scale) ​

6. Open Questions ​

Threat Model & Security Assessment

1. System Boundaries

2. Threat Actors

3. Attack Surfaces

3.1 Prompt Injection

3.2 Tenant Isolation (Agency Model)

3.3 Knowledge Poisoning

3.4 LLM Data Exposure

3.5 Credential Theft & Key Compromise

3.6 Agent Privilege Escalation

3.7 Resource Exhaustion

3.8 Supply Chain Attacks

3.9 Data Exfiltration via Agents

3.10 Insider Threat

4. Security Properties Required

4.1 Confidentiality

4.2 Integrity

4.3 Availability

4.4 Non-Repudiation

5. MVP Security Scope

MVP (must have)

Phase 2 (important but not blocking)

Phase 3+ (compliance & scale)

6. Open Questions