Cost Model & Unit Economics

Research for Project Kaze Companion to scalability-model.md (performance scaling)

1. Cost Structure Overview

Where the money goes, ordered by magnitude:

┌────────────────────────────────────────────────────────┐
│  VARIABLE COSTS (scale with usage)                      │
│                                                         │
│  ████████████████████████████████  LLM Tokens (60-80%)  │
│  ████████                         External APIs (10-15%)│
│  ████                             Embedding Gen (3-5%)  │
│                                                         │
│  SEMI-FIXED COSTS (scale with tenants/cells)            │
│                                                         │
│  ██████████████                   Compute / K8s (varies)│
│  ████████                         Database (varies)     │
│  ████                             Storage (varies)      │
│                                                         │
│  FIXED COSTS (exist regardless)                         │
│                                                         │
│  ██████                           Control plane         │
│  ████                             CI/CD + Registry      │
│  ████                             Monitoring base       │
│  ██                               Vault                 │
└────────────────────────────────────────────────────────┘

Key insight: LLM token cost dominates everything. A 20% reduction in tokens per task saves more money than halving infrastructure costs. Cost optimization strategy should focus overwhelmingly on LLM efficiency.

2. LLM Token Cost Model

2.1 Provider Pricing (as of Feb 2026)

Prices per 1M tokens:

Provider	Model	Tier	Input	Output	Best for
Anthropic	Claude Haiku 4.5	fast	$1.00	$5.00	Classification, extraction, simple tasks
Anthropic	Claude Sonnet 4.5	balanced	$3.00	$15.00	General reasoning, tool use
Anthropic	Claude Opus 4.6	best	$5.00	$25.00	Complex reasoning, quality evaluation
OpenAI	GPT-4.1	balanced	$2.00	$8.00	General tasks, good cost/quality ratio
OpenAI	GPT-4o	balanced	$2.50	$10.00	Multimodal, general tasks
Google	Gemini 2.5 Flash-Lite	fast	$0.10	$0.40	Cheapest option, bulk processing
Google	Gemini 2.5 Pro	balanced	$1.25	$10.00	Strong reasoning, long context
OpenAI	text-embedding-3-small	embed	$0.02	—	Knowledge embeddings (cheap)
OpenAI	text-embedding-3-large	embed	$0.13	—	Knowledge embeddings (better quality)
Local	Ollama (Llama 3, Mistral)	fast	$0*	$0*	Zero marginal cost, limited quality

*Local model cost is compute-only (GPU hardware amortization), not per-token.

Cost reduction features:

Prompt caching (Anthropic): 90% discount on repeated context → huge savings for agents with stable system prompts
Batch API (Anthropic, OpenAI): 50% discount for non-urgent tasks → knowledge consolidation, quality evaluation
Gemini free tier: 1,000 requests/day → useful for development and low-volume testing

2.2 Tokens per Task (Estimates)

Estimated token usage per task type based on typical agent workflows:

Task Type	Input tokens	Output tokens	LLM calls	Total tokens
Simple extraction (parse invoice, classify ticket)	~1,500	~500	1	~2,000
Keyword research (SEO)	~4,000	~2,000	2-3	~10,000
Content optimization (SEO)	~6,000	~3,000	2-4	~15,000
Data quality check (Toddle)	~2,000	~1,000	1-2	~5,000
Content enrichment (Toddle)	~3,000	~2,000	2-3	~8,000
Research synthesis (V0 Internal Ops)	~8,000	~4,000	3-5	~20,000
Project status update (V0)	~3,000	~1,500	1-2	~6,000
Technical audit (SEO)	~10,000	~5,000	4-6	~25,000
Quality evaluation (Layer 3)	~4,000	~1,000	1	~5,000

These exclude system prompt tokens (typically ~1-2K, amortized via prompt caching).

2.3 Cost per Task by Model Tier

Combining tokens per task with provider pricing:

Task Type	Fast (Haiku)	Balanced (Sonnet)	Best (Opus)	Cheapest (Gemini Flash-Lite)
Simple extraction	$0.004	$0.011	$0.018	$0.0004
Keyword research	$0.014	$0.042	$0.070	$0.002
Content optimization	$0.021	$0.063	$0.105	$0.002
Data quality check	$0.007	$0.021	$0.035	$0.001
Content enrichment	$0.013	$0.039	$0.065	$0.001
Research synthesis	$0.028	$0.084	$0.140	$0.003
Technical audit	$0.035	$0.105	$0.175	$0.004
Quality evaluation	$0.009	$0.027	$0.045	$0.001

Observation: Even with the most expensive model (Opus), a complex task like a technical audit costs ~$0.18. Most tasks are under $0.10 on balanced models. The cheapest option (Gemini Flash-Lite) brings costs down to fractions of a cent.

2.4 Monthly Cost per Agent

Estimated monthly LLM cost per agent based on task frequency:

Agent Type	Tasks/day	Avg tokens/task	Model tier	Monthly LLM cost
SEO Keyword Research	5-10	~10,000	balanced	$13-$26
SEO Content Optimization	3-5	~15,000	balanced	$12-$20
SEO Technical Audit	1 (weekly)	~25,000	balanced	$3
SEO Reporting	1 (weekly)	~15,000	balanced	$2
Toddle Content Enrichment	20-50	~8,000	fast	$5-$13
Toddle Data Quality	50-100	~5,000	fast	$8-$15
V0 Research Agent	2-5	~20,000	balanced	$10-$26
V0 Project Management	10-20	~6,000	balanced	$8-$16
Quality Monitor (L3)	20-50	~5,000	best	$13-$34

Typical agent: ~$10-30/month in LLM costs on balanced models. High-volume agents (data quality, content enrichment) on fast models stay under $15/month.

2.5 Monthly Cost per Tenant

Based on typical agent deployments per vertical:

Vertical	Agents per tenant	Monthly LLM cost per tenant
SEO (full suite)	4 (keyword + content + audit + reporting)	$30-$50
Toddle	3 (enrichment + quality + recommendations)	$20-$40
Internal Ops (V0)	5 (research + PM + scheduling + docs + issues)	$40-$80

Note: These are Speedrun's LLM costs. Under the dual-key model (D7), clients with their own API keys offload these costs. BYOK clients have near-zero LLM cost to Speedrun.

2.6 Embedding Costs

Knowledge system embedding costs (using text-embedding-3-small at $0.02/1M tokens):

Operation	Tokens	Cost
Embed one knowledge entry (~500 tokens)	500	$0.00001
Embed one query (~100 tokens)	100	$0.000002
1,000 knowledge writes/day	500K	$0.01/day
10,000 queries/day	1M	$0.02/day

Verdict: Embedding costs are negligible — under $1/month even at Stage 2. Not a cost concern.

3. Infrastructure Cost per Cell Type

Based on AWS pricing (us-east-1, on-demand). Other clouds are comparable within ~20%.

3.1 Shared Cell (Multi-Tenant)

A shared cell hosts 5-20 tenants with namespace isolation. Infrastructure is amortized.

Component	Instance/Resource	Monthly cost
K8s control plane (EKS)	Managed	$73
K8s worker nodes (3×)	t3.large (2 vCPU, 8GB)	$183
PostgreSQL	db.t3.medium (RDS) or self-hosted	$50-$100
PgBouncer	Runs on worker node	$0 (included)
Vault	Self-hosted on worker node	$0 (included)
Monitoring (Prometheus + Grafana + Loki)	Self-hosted on worker node	$0 (included)
Object storage (MinIO or S3)	100GB	$2-$5
Network (ALB, data transfer)	Moderate	$30-$50
Total shared cell		~$340-$410/month
Per tenant (10 tenants)		~$34-$41/month
Per tenant (20 tenants)		~$17-$21/month

3.2 Dedicated Cell (Single-Tenant)

Full stack for one client. Same components, no sharing.

Component	Instance/Resource	Monthly cost
K8s control plane	Managed	$73
K8s worker nodes (2×)	t3.large	$122
PostgreSQL	db.t3.medium	$50-$100
Vault + Monitoring	Self-hosted	$0 (included)
Object storage	50GB	$1-$3
Network	Low	$20-$30
Total dedicated cell		~$270-$330/month

3.3 Customer VPC Cell

Same as dedicated, but deployed in client's cloud. Client pays infrastructure; Speedrun pays ops labor.

Cost to client	Monthly
Infrastructure (same as dedicated)	~$270-$330

Cost to Speedrun	Monthly
Ops overhead (monitoring, updates, incident response)	~$50-$100 (amortized labor)
VPN + health beacon infrastructure	~$10
Total Speedrun cost per VPC client	~$60-$110/month

3.4 Fixed Infrastructure (Exists Regardless of Tenants)

Component	Monthly cost
GitHub (organization plan, CI/CD minutes)	$50-$100
Container registry (GHCR, storage)	$10-$20
DNS + domain	$5-$10
Development/staging environment	$150-$250
Total fixed	~$215-$380/month

4. External Service Costs

4.1 SEMrush API

Tier	Monthly cost	Included API units	Notes
Business plan (required for API)	$500/month	10,000 requests	Shared across all SEO tenants
Additional units	Varies	Contact sales	Scale as client count grows

At 10 SEO clients, each running 5-10 keyword research tasks/day × 2-3 API calls each = ~1,500-3,000 API calls/month total. Well within the 10,000 included.

Scaling trigger: 20+ active SEO clients likely exceeds the included 10K units → negotiate enterprise API plan or buy additional units.

4.2 Other External APIs

Service	Cost	Notes
Google Search Console API	Free	Google rate limits apply (25K queries/day)
GitHub API	Free tier: 5,000 requests/hr	Sufficient for V0 agents
Google Calendar API	Free tier: 1M requests/day	More than sufficient
Toddle DB	Internal — no API cost	Direct database access

4.3 Total External API Costs

Stage	Monthly external API cost
Stage 0 (MVP)	~$500 (SEMrush only)
Stage 1 (5-10 clients)	~$500-$700
Stage 2 (20-50 clients)	~$800-$1,500 (SEMrush scale-up)
Stage 3 (100+ clients)	~$2,000-$5,000 (enterprise API plans)

5. Unit Economics

5.1 Cost per Task (Fully Loaded)

Including LLM tokens, embedding, compute slice, and tool calls:

Task Type	LLM cost (balanced)	Embedding	Compute slice*	Tool API	Total cost
Simple extraction	$0.011	$0.00001	$0.002	$0	~$0.013
Keyword research	$0.042	$0.00002	$0.005	$0.05	~$0.10
Content optimization	$0.063	$0.00003	$0.005	$0.02	~$0.09
Data quality check	$0.021	$0.00001	$0.002	$0	~$0.023
Research synthesis	$0.084	$0.00005	$0.008	$0	~$0.09
Technical audit	$0.105	$0.00005	$0.010	$0.05	~$0.17

Compute slice: estimated amortized infrastructure cost per task (total infra / total tasks).

Takeaway: Most tasks cost $0.02-$0.17 fully loaded. Even the most expensive task (technical audit) is under $0.20.

5.2 Cost per Agent per Month

Agent category	LLM cost	Infra share	Tool APIs	Total/month
Low-frequency (weekly tasks)	$2-$5	$3-$5	$0-$5	$5-$15
Medium-frequency (5-10 tasks/day)	$10-$25	$3-$5	$5-$15	$18-$45
High-frequency (50+ tasks/day)	$8-$15 (fast model)	$3-$5	$0-$5	$11-$25

5.3 Cost per Tenant per Month

Tenant type	Agents	LLM	Infra (shared cell)	Tools	Total/month
Small (3 agents, basic vertical)	3	$30-$60	$20-$35	$0-$20	$50-$115
Medium (5 agents, full vertical)	5	$50-$120	$25-$40	$20-$50	$95-$210
Large (8+ agents, multi-vertical)	8+	$100-$250	$270-$330 (dedicated)	$50-$100	$420-$680
BYOK client (brings own LLM keys)	5	$0*	$25-$40	$20-$50	$45-$90

BYOK clients pay their own LLM costs directly to providers. Speedrun's cost for these clients is infra + tools only.

5.4 Gross Margin Model

What pricing covers the costs:

Pricing tier	Revenue/month	Cost/month	Gross margin
Small client @ $300/month	$300	$50-$115	62-83%
Medium client @ $600/month	$600	$95-$210	65-84%
Large client @ $1,500/month	$1,500	$420-$680	55-72%
BYOK medium @ $400/month	$400	$45-$90	78-89%

Target gross margin: 65-80%. Achievable at all tiers except potentially large clients on dedicated cells with heavy LLM usage (55% is tight). Solutions: push BYOK for large clients, or adjust dedicated cell pricing.

6. Scale Cost Curves

6.1 Total Monthly Platform Cost by Stage

Cost category	Stage 0	Stage 1 (10)	Stage 2 (50)	Stage 3 (200)
Fixed infra	$300	$300	$400	$600
Cell infra	$350 (1 shared)	$700 (2 shared)	$2,500 (5 shared + 2 dedicated)	$12,000 (10 shared + 10 dedicated + VPCs)
LLM tokens	$200 (V0 only)	$500-$1,000	$3,000-$8,000	$15,000-$40,000
External APIs	$500	$600	$1,200	$4,000
Embedding	$1	$5	$20	$80
Total	~$1,350	~$2,100-$2,600	~$7,100-$12,100	~$31,700-$56,700
Per tenant	—	$210-$260	$142-$242	$159-$284

6.2 Fixed vs Variable Split

Stage	Fixed costs	Variable costs	Fixed %
Stage 0	$650 (infra)	$700 (LLM + APIs)	48%
Stage 1	$1,000	$1,100-$1,600	38-48%
Stage 2	$2,900	$4,200-$9,200	24-41%
Stage 3	$12,600	$19,100-$44,100	22-40%

Pattern: As scale increases, LLM tokens dominate and the cost structure shifts heavily toward variable costs. This is good — costs scale with revenue (usage-based), not ahead of it.

6.3 Economies of Scale

What gets cheaper per unit	Why
Infrastructure amortization	More tenants per shared cell = lower per-tenant infra cost
Fixed costs per tenant	CI/CD, registry, dev environment amortized over more tenants
SEMrush and tool APIs	Enterprise plans offer better unit rates at higher volumes
Monitoring overhead	Shared monitoring stack scales sublinearly

What stays proportional	Why
LLM tokens per task	Same task = same tokens, regardless of scale
Embedding costs per write	Same write = same embedding, regardless of scale
Tool API calls per task	Same task = same API calls

What gets cheaper with optimization (not scale)	Why
LLM cost per task	Model selection optimization, prompt caching, prompt shortening
Knowledge query cost	Query caching, better retrieval (fewer irrelevant results)

7. Cost Optimization Levers

Ordered by impact:

7.1 Model Selection Optimization (High Impact)

Route each task to the cheapest model that meets the quality bar:

Strategy	Savings estimate	Implementation
Use `fast` for extraction/classification	60-80% vs balanced	LLM Gateway model hint → tenant config mapping
Use `balanced` for reasoning, `fast` for everything else	40-50% overall	Quality monitor evaluates if fast model is sufficient per skill
Use Gemini Flash-Lite for bulk processing	90%+ vs Haiku	Batch API + cheapest model for background tasks

Example: An SEO agent running keyword research:

Old: all calls on Sonnet → $0.042/task
Optimized: tool parsing on Haiku, reasoning on Sonnet → ~$0.025/task (40% savings)

7.2 Prompt Caching (High Impact)

Anthropic prompt caching: 90% discount on repeated context (system prompt, skill definitions, loaded knowledge).

Agent type	Cacheable context	Savings per task
Any agent with stable system prompt	~1-2K tokens (system + skill)	~$0.003-$0.006 saved per task
Agent with pre-loaded knowledge	~3-5K tokens (knowledge context)	~$0.009-$0.015 saved per task

At 1,000 tasks/day, prompt caching saves ~$90-$180/month.

7.3 Client BYOK (High Impact on Margins)

Clients who bring their own LLM keys eliminate Speedrun's largest variable cost:

Scenario	Speedrun's LLM cost	Gross margin impact
Speedrun pays all LLM	$50-$120/tenant/month	65-80% margin
Client BYOK	$0/tenant	78-89% margin
Hybrid (client key preferred, Speedrun fallback)	$10-$30/tenant/month	75-85% margin

Strategy: Default to client BYOK. Speedrun keys as fallback only. Pricing reflects this — BYOK clients get a lower base price, Speedrun-key clients pay a premium.

7.4 Batch API for Background Tasks (Medium Impact)

50% discount for non-urgent tasks:

Task category	Eligible for batch	Monthly savings (50 agents)
Knowledge consolidation	Yes	~$20-$40
Quality evaluation	Yes	~$15-$30
Scheduled reports	Yes	~$5-$10
Conversation responses	No (latency-sensitive)	—

7.5 Local Model Overflow (Medium Impact)

Self-hosted models via Ollama/vLLM for non-critical tasks:

Component	Hardware cost	What it handles	Monthly savings
1× GPU instance (A10G)	~$300-$500/month	Embeddings, classification, extraction	~$50-$100 in API costs

Break-even: local GPU pays for itself when API embedding + classification costs exceed ~$300-$500/month (roughly Stage 2).

7.6 Spot/Preemptible Instances (Low-Medium Impact)

Agent runtime pods are stateless and restartable → good candidates for spot instances:

Instance type	On-demand	Spot	Savings
t3.large (agent nodes)	$61/month	~$18-$25/month	60-70%
m5.large (agent nodes)	$70/month	~$21-$30/month	57-70%

Risk: Spot instances can be reclaimed. Agent tasks must be designed for graceful interruption (they already are — actor model with task retry).

7.7 Prompt Optimization (Low-Medium Impact, Ongoing)

Shorter prompts = fewer tokens = lower cost:

Optimization	Token reduction	Monthly savings (50 agents)
Remove verbose instructions	10-20% of system prompt	~$10-$30
Compress knowledge context	20-30% of retrieved knowledge	~$15-$40
Use structured output formats	10-15% of output tokens	~$5-$15

8. Pricing Implications

8.1 Minimum Viable Price

To be margin-positive (>65% gross margin):

Tenant type	Cost floor	Minimum price (65% margin)	Suggested pricing
Small (3 agents, BYOK)	~$50	~$143	$150-$200/month
Small (3 agents, Speedrun keys)	~$115	~$329	$300-$400/month
Medium (5 agents, BYOK)	~$90	~$257	$250-$350/month
Medium (5 agents, Speedrun keys)	~$210	~$600	$500-$700/month
Large (8+ agents, dedicated)	~$680	~$1,943	$1,500-$2,500/month

8.2 Pricing Model Options

Model	Pros	Cons	Fit for Kaze
Per-agent subscription ($X/agent/month)	Predictable revenue, simple to understand	Discourages adding agents, doesn't reflect usage	Moderate
Per-task pricing ($X/task)	Aligns cost with value, scales with usage	Unpredictable bills, complex metering	Low (too complex for SMEs)
Tiered subscription (plan tiers with agent/task limits)	Predictable for both sides, upgrade path clear	May not fit all usage patterns	High — recommended
Subscription + usage overage (base fee + per-task over limit)	Predictable base + flexible scaling	Complexity of two billing dimensions	Medium

Recommendation: Tiered subscription with BYOK discount. Three tiers aligned to cell density:

Tier	Agents	Cell type	BYOK price	Speedrun-key price
Starter	Up to 3	Shared	$200/month	$400/month
Growth	Up to 8	Shared	$500/month	$900/month
Enterprise	Unlimited	Dedicated/VPC	Custom	Custom

8.3 BYOK Impact

BYOK fundamentally changes unit economics:

Client pays LLM costs directly → Speedrun's variable cost drops 60-80%
Speedrun's cost becomes primarily infrastructure → more predictable, better margins
Clients with provider credits/discounts get better rates than Speedrun could offer
Risk: Speedrun loses visibility into token spend (mitigated by LLM Gateway tracking regardless of key owner)

Strategy: Default pricing assumes BYOK. Speedrun-key pricing is a premium add-on for clients who don't want to manage their own API keys.

9. Cost Monitoring & Alerts

Metrics to track for financial health:

Metric	Alert threshold	Action
LLM cost per task (by skill)	>2× baseline for that skill	Investigate — model routing may be wrong, or agent is looping
LLM cost per tenant per day	>daily budget (tenant-specific)	Hard stop (existing budget enforcement)
Infrastructure cost per tenant	>120% of tier allocation	Review tenant's agent count and task frequency
External API cost per month	>budget	Review API plan, negotiate enterprise tier
Gross margin per tenant	<50%	Flag for pricing review or optimization push
BYOK vs Speedrun-key ratio	<30% BYOK	Push BYOK adoption — margins are too thin on Speedrun keys

10. Key Takeaways

LLM tokens are 60-80% of variable cost. Every optimization dollar should go here first.
Most tasks cost $0.02-$0.17 fully loaded. This is cheap enough for high-volume automation.
BYOK clients are dramatically more profitable. Push BYOK as default, Speedrun keys as premium.
Shared cells are cost-effective up to ~20 tenants. Infrastructure cost per tenant drops below $25.
Model selection optimization is the single highest-impact lever. Routing tasks to the cheapest adequate model can cut LLM costs 40-50%.
Embedding costs are negligible. Don't optimize here.
Target gross margin of 65-80% is achievable at all tiers with BYOK. Speedrun-key clients need higher pricing.
Cost scales with usage, not ahead of it. The variable-heavy cost structure means costs grow proportionally with revenue — no cliff edges.

Cost Model & Unit Economics ​

1. Cost Structure Overview ​

2. LLM Token Cost Model ​

2.1 Provider Pricing (as of Feb 2026) ​

2.2 Tokens per Task (Estimates) ​

2.3 Cost per Task by Model Tier ​

2.4 Monthly Cost per Agent ​

2.5 Monthly Cost per Tenant ​

2.6 Embedding Costs ​

3. Infrastructure Cost per Cell Type ​

3.1 Shared Cell (Multi-Tenant) ​

3.2 Dedicated Cell (Single-Tenant) ​

3.3 Customer VPC Cell ​

3.4 Fixed Infrastructure (Exists Regardless of Tenants) ​

4. External Service Costs ​

4.1 SEMrush API ​

4.2 Other External APIs ​

4.3 Total External API Costs ​

5. Unit Economics ​

5.1 Cost per Task (Fully Loaded) ​

5.2 Cost per Agent per Month ​

5.3 Cost per Tenant per Month ​

5.4 Gross Margin Model ​

6. Scale Cost Curves ​

6.1 Total Monthly Platform Cost by Stage ​

6.2 Fixed vs Variable Split ​

6.3 Economies of Scale ​

7. Cost Optimization Levers ​

7.1 Model Selection Optimization (High Impact) ​

7.2 Prompt Caching (High Impact) ​

7.3 Client BYOK (High Impact on Margins) ​

7.4 Batch API for Background Tasks (Medium Impact) ​

7.5 Local Model Overflow (Medium Impact) ​

7.6 Spot/Preemptible Instances (Low-Medium Impact) ​

7.7 Prompt Optimization (Low-Medium Impact, Ongoing) ​

8. Pricing Implications ​

8.1 Minimum Viable Price ​

8.2 Pricing Model Options ​

8.3 BYOK Impact ​

9. Cost Monitoring & Alerts ​

10. Key Takeaways ​