Skip to content

Cost Model & Unit Economics

Research for Project Kaze Companion to scalability-model.md (performance scaling)


1. Cost Structure Overview

Where the money goes, ordered by magnitude:

┌────────────────────────────────────────────────────────┐
│  VARIABLE COSTS (scale with usage)                      │
│                                                         │
│  ████████████████████████████████  LLM Tokens (60-80%)  │
│  ████████                         External APIs (10-15%)│
│  ████                             Embedding Gen (3-5%)  │
│                                                         │
│  SEMI-FIXED COSTS (scale with tenants/cells)            │
│                                                         │
│  ██████████████                   Compute / K8s (varies)│
│  ████████                         Database (varies)     │
│  ████                             Storage (varies)      │
│                                                         │
│  FIXED COSTS (exist regardless)                         │
│                                                         │
│  ██████                           Control plane         │
│  ████                             CI/CD + Registry      │
│  ████                             Monitoring base       │
│  ██                               Vault                 │
└────────────────────────────────────────────────────────┘

Key insight: LLM token cost dominates everything. A 20% reduction in tokens per task saves more money than halving infrastructure costs. Cost optimization strategy should focus overwhelmingly on LLM efficiency.


2. LLM Token Cost Model

2.1 Provider Pricing (as of Feb 2026)

Prices per 1M tokens:

ProviderModelTierInputOutputBest for
AnthropicClaude Haiku 4.5fast$1.00$5.00Classification, extraction, simple tasks
AnthropicClaude Sonnet 4.5balanced$3.00$15.00General reasoning, tool use
AnthropicClaude Opus 4.6best$5.00$25.00Complex reasoning, quality evaluation
OpenAIGPT-4.1balanced$2.00$8.00General tasks, good cost/quality ratio
OpenAIGPT-4obalanced$2.50$10.00Multimodal, general tasks
GoogleGemini 2.5 Flash-Litefast$0.10$0.40Cheapest option, bulk processing
GoogleGemini 2.5 Probalanced$1.25$10.00Strong reasoning, long context
OpenAItext-embedding-3-smallembed$0.02Knowledge embeddings (cheap)
OpenAItext-embedding-3-largeembed$0.13Knowledge embeddings (better quality)
LocalOllama (Llama 3, Mistral)fast$0*$0*Zero marginal cost, limited quality

*Local model cost is compute-only (GPU hardware amortization), not per-token.

Cost reduction features:

  • Prompt caching (Anthropic): 90% discount on repeated context → huge savings for agents with stable system prompts
  • Batch API (Anthropic, OpenAI): 50% discount for non-urgent tasks → knowledge consolidation, quality evaluation
  • Gemini free tier: 1,000 requests/day → useful for development and low-volume testing

2.2 Tokens per Task (Estimates)

Estimated token usage per task type based on typical agent workflows:

Task TypeInput tokensOutput tokensLLM callsTotal tokens
Simple extraction (parse invoice, classify ticket)~1,500~5001~2,000
Keyword research (SEO)~4,000~2,0002-3~10,000
Content optimization (SEO)~6,000~3,0002-4~15,000
Data quality check (Toddle)~2,000~1,0001-2~5,000
Content enrichment (Toddle)~3,000~2,0002-3~8,000
Research synthesis (V0 Internal Ops)~8,000~4,0003-5~20,000
Project status update (V0)~3,000~1,5001-2~6,000
Technical audit (SEO)~10,000~5,0004-6~25,000
Quality evaluation (Layer 3)~4,000~1,0001~5,000

These exclude system prompt tokens (typically ~1-2K, amortized via prompt caching).

2.3 Cost per Task by Model Tier

Combining tokens per task with provider pricing:

Task TypeFast (Haiku)Balanced (Sonnet)Best (Opus)Cheapest (Gemini Flash-Lite)
Simple extraction$0.004$0.011$0.018$0.0004
Keyword research$0.014$0.042$0.070$0.002
Content optimization$0.021$0.063$0.105$0.002
Data quality check$0.007$0.021$0.035$0.001
Content enrichment$0.013$0.039$0.065$0.001
Research synthesis$0.028$0.084$0.140$0.003
Technical audit$0.035$0.105$0.175$0.004
Quality evaluation$0.009$0.027$0.045$0.001

Observation: Even with the most expensive model (Opus), a complex task like a technical audit costs ~$0.18. Most tasks are under $0.10 on balanced models. The cheapest option (Gemini Flash-Lite) brings costs down to fractions of a cent.

2.4 Monthly Cost per Agent

Estimated monthly LLM cost per agent based on task frequency:

Agent TypeTasks/dayAvg tokens/taskModel tierMonthly LLM cost
SEO Keyword Research5-10~10,000balanced$13-$26
SEO Content Optimization3-5~15,000balanced$12-$20
SEO Technical Audit1 (weekly)~25,000balanced$3
SEO Reporting1 (weekly)~15,000balanced$2
Toddle Content Enrichment20-50~8,000fast$5-$13
Toddle Data Quality50-100~5,000fast$8-$15
V0 Research Agent2-5~20,000balanced$10-$26
V0 Project Management10-20~6,000balanced$8-$16
Quality Monitor (L3)20-50~5,000best$13-$34

Typical agent: ~$10-30/month in LLM costs on balanced models. High-volume agents (data quality, content enrichment) on fast models stay under $15/month.

2.5 Monthly Cost per Tenant

Based on typical agent deployments per vertical:

VerticalAgents per tenantMonthly LLM cost per tenant
SEO (full suite)4 (keyword + content + audit + reporting)$30-$50
Toddle3 (enrichment + quality + recommendations)$20-$40
Internal Ops (V0)5 (research + PM + scheduling + docs + issues)$40-$80

Note: These are Speedrun's LLM costs. Under the dual-key model (D7), clients with their own API keys offload these costs. BYOK clients have near-zero LLM cost to Speedrun.

2.6 Embedding Costs

Knowledge system embedding costs (using text-embedding-3-small at $0.02/1M tokens):

OperationTokensCost
Embed one knowledge entry (~500 tokens)500$0.00001
Embed one query (~100 tokens)100$0.000002
1,000 knowledge writes/day500K$0.01/day
10,000 queries/day1M$0.02/day

Verdict: Embedding costs are negligible — under $1/month even at Stage 2. Not a cost concern.


3. Infrastructure Cost per Cell Type

Based on AWS pricing (us-east-1, on-demand). Other clouds are comparable within ~20%.

3.1 Shared Cell (Multi-Tenant)

A shared cell hosts 5-20 tenants with namespace isolation. Infrastructure is amortized.

ComponentInstance/ResourceMonthly cost
K8s control plane (EKS)Managed$73
K8s worker nodes (3×)t3.large (2 vCPU, 8GB)$183
PostgreSQLdb.t3.medium (RDS) or self-hosted$50-$100
PgBouncerRuns on worker node$0 (included)
VaultSelf-hosted on worker node$0 (included)
Monitoring (Prometheus + Grafana + Loki)Self-hosted on worker node$0 (included)
Object storage (MinIO or S3)100GB$2-$5
Network (ALB, data transfer)Moderate$30-$50
Total shared cell~$340-$410/month
Per tenant (10 tenants)~$34-$41/month
Per tenant (20 tenants)~$17-$21/month

3.2 Dedicated Cell (Single-Tenant)

Full stack for one client. Same components, no sharing.

ComponentInstance/ResourceMonthly cost
K8s control planeManaged$73
K8s worker nodes (2×)t3.large$122
PostgreSQLdb.t3.medium$50-$100
Vault + MonitoringSelf-hosted$0 (included)
Object storage50GB$1-$3
NetworkLow$20-$30
Total dedicated cell~$270-$330/month

3.3 Customer VPC Cell

Same as dedicated, but deployed in client's cloud. Client pays infrastructure; Speedrun pays ops labor.

Cost to clientMonthly
Infrastructure (same as dedicated)~$270-$330
Cost to SpeedrunMonthly
Ops overhead (monitoring, updates, incident response)~$50-$100 (amortized labor)
VPN + health beacon infrastructure~$10
Total Speedrun cost per VPC client~$60-$110/month

3.4 Fixed Infrastructure (Exists Regardless of Tenants)

ComponentMonthly cost
GitHub (organization plan, CI/CD minutes)$50-$100
Container registry (GHCR, storage)$10-$20
DNS + domain$5-$10
Development/staging environment$150-$250
Total fixed~$215-$380/month

4. External Service Costs

4.1 SEMrush API

TierMonthly costIncluded API unitsNotes
Business plan (required for API)$500/month10,000 requestsShared across all SEO tenants
Additional unitsVariesContact salesScale as client count grows

At 10 SEO clients, each running 5-10 keyword research tasks/day × 2-3 API calls each = ~1,500-3,000 API calls/month total. Well within the 10,000 included.

Scaling trigger: 20+ active SEO clients likely exceeds the included 10K units → negotiate enterprise API plan or buy additional units.

4.2 Other External APIs

ServiceCostNotes
Google Search Console APIFreeGoogle rate limits apply (25K queries/day)
GitHub APIFree tier: 5,000 requests/hrSufficient for V0 agents
Google Calendar APIFree tier: 1M requests/dayMore than sufficient
Toddle DBInternal — no API costDirect database access

4.3 Total External API Costs

StageMonthly external API cost
Stage 0 (MVP)~$500 (SEMrush only)
Stage 1 (5-10 clients)~$500-$700
Stage 2 (20-50 clients)~$800-$1,500 (SEMrush scale-up)
Stage 3 (100+ clients)~$2,000-$5,000 (enterprise API plans)

5. Unit Economics

5.1 Cost per Task (Fully Loaded)

Including LLM tokens, embedding, compute slice, and tool calls:

Task TypeLLM cost (balanced)EmbeddingCompute slice*Tool APITotal cost
Simple extraction$0.011$0.00001$0.002$0~$0.013
Keyword research$0.042$0.00002$0.005$0.05~$0.10
Content optimization$0.063$0.00003$0.005$0.02~$0.09
Data quality check$0.021$0.00001$0.002$0~$0.023
Research synthesis$0.084$0.00005$0.008$0~$0.09
Technical audit$0.105$0.00005$0.010$0.05~$0.17

Compute slice: estimated amortized infrastructure cost per task (total infra / total tasks).

Takeaway: Most tasks cost $0.02-$0.17 fully loaded. Even the most expensive task (technical audit) is under $0.20.

5.2 Cost per Agent per Month

Agent categoryLLM costInfra shareTool APIsTotal/month
Low-frequency (weekly tasks)$2-$5$3-$5$0-$5$5-$15
Medium-frequency (5-10 tasks/day)$10-$25$3-$5$5-$15$18-$45
High-frequency (50+ tasks/day)$8-$15 (fast model)$3-$5$0-$5$11-$25

5.3 Cost per Tenant per Month

Tenant typeAgentsLLMInfra (shared cell)ToolsTotal/month
Small (3 agents, basic vertical)3$30-$60$20-$35$0-$20$50-$115
Medium (5 agents, full vertical)5$50-$120$25-$40$20-$50$95-$210
Large (8+ agents, multi-vertical)8+$100-$250$270-$330 (dedicated)$50-$100$420-$680
BYOK client (brings own LLM keys)5$0*$25-$40$20-$50$45-$90

BYOK clients pay their own LLM costs directly to providers. Speedrun's cost for these clients is infra + tools only.

5.4 Gross Margin Model

What pricing covers the costs:

Pricing tierRevenue/monthCost/monthGross margin
Small client @ $300/month$300$50-$11562-83%
Medium client @ $600/month$600$95-$21065-84%
Large client @ $1,500/month$1,500$420-$68055-72%
BYOK medium @ $400/month$400$45-$9078-89%

Target gross margin: 65-80%. Achievable at all tiers except potentially large clients on dedicated cells with heavy LLM usage (55% is tight). Solutions: push BYOK for large clients, or adjust dedicated cell pricing.


6. Scale Cost Curves

6.1 Total Monthly Platform Cost by Stage

Cost categoryStage 0Stage 1 (10)Stage 2 (50)Stage 3 (200)
Fixed infra$300$300$400$600
Cell infra$350 (1 shared)$700 (2 shared)$2,500 (5 shared + 2 dedicated)$12,000 (10 shared + 10 dedicated + VPCs)
LLM tokens$200 (V0 only)$500-$1,000$3,000-$8,000$15,000-$40,000
External APIs$500$600$1,200$4,000
Embedding$1$5$20$80
Total~$1,350~$2,100-$2,600~$7,100-$12,100~$31,700-$56,700
Per tenant$210-$260$142-$242$159-$284

6.2 Fixed vs Variable Split

StageFixed costsVariable costsFixed %
Stage 0$650 (infra)$700 (LLM + APIs)48%
Stage 1$1,000$1,100-$1,60038-48%
Stage 2$2,900$4,200-$9,20024-41%
Stage 3$12,600$19,100-$44,10022-40%

Pattern: As scale increases, LLM tokens dominate and the cost structure shifts heavily toward variable costs. This is good — costs scale with revenue (usage-based), not ahead of it.

6.3 Economies of Scale

What gets cheaper per unitWhy
Infrastructure amortizationMore tenants per shared cell = lower per-tenant infra cost
Fixed costs per tenantCI/CD, registry, dev environment amortized over more tenants
SEMrush and tool APIsEnterprise plans offer better unit rates at higher volumes
Monitoring overheadShared monitoring stack scales sublinearly
What stays proportionalWhy
LLM tokens per taskSame task = same tokens, regardless of scale
Embedding costs per writeSame write = same embedding, regardless of scale
Tool API calls per taskSame task = same API calls
What gets cheaper with optimization (not scale)Why
LLM cost per taskModel selection optimization, prompt caching, prompt shortening
Knowledge query costQuery caching, better retrieval (fewer irrelevant results)

7. Cost Optimization Levers

Ordered by impact:

7.1 Model Selection Optimization (High Impact)

Route each task to the cheapest model that meets the quality bar:

StrategySavings estimateImplementation
Use fast for extraction/classification60-80% vs balancedLLM Gateway model hint → tenant config mapping
Use balanced for reasoning, fast for everything else40-50% overallQuality monitor evaluates if fast model is sufficient per skill
Use Gemini Flash-Lite for bulk processing90%+ vs HaikuBatch API + cheapest model for background tasks

Example: An SEO agent running keyword research:

  • Old: all calls on Sonnet → $0.042/task
  • Optimized: tool parsing on Haiku, reasoning on Sonnet → ~$0.025/task (40% savings)

7.2 Prompt Caching (High Impact)

Anthropic prompt caching: 90% discount on repeated context (system prompt, skill definitions, loaded knowledge).

Agent typeCacheable contextSavings per task
Any agent with stable system prompt~1-2K tokens (system + skill)~$0.003-$0.006 saved per task
Agent with pre-loaded knowledge~3-5K tokens (knowledge context)~$0.009-$0.015 saved per task

At 1,000 tasks/day, prompt caching saves ~$90-$180/month.

7.3 Client BYOK (High Impact on Margins)

Clients who bring their own LLM keys eliminate Speedrun's largest variable cost:

ScenarioSpeedrun's LLM costGross margin impact
Speedrun pays all LLM$50-$120/tenant/month65-80% margin
Client BYOK$0/tenant78-89% margin
Hybrid (client key preferred, Speedrun fallback)$10-$30/tenant/month75-85% margin

Strategy: Default to client BYOK. Speedrun keys as fallback only. Pricing reflects this — BYOK clients get a lower base price, Speedrun-key clients pay a premium.

7.4 Batch API for Background Tasks (Medium Impact)

50% discount for non-urgent tasks:

Task categoryEligible for batchMonthly savings (50 agents)
Knowledge consolidationYes~$20-$40
Quality evaluationYes~$15-$30
Scheduled reportsYes~$5-$10
Conversation responsesNo (latency-sensitive)

7.5 Local Model Overflow (Medium Impact)

Self-hosted models via Ollama/vLLM for non-critical tasks:

ComponentHardware costWhat it handlesMonthly savings
1× GPU instance (A10G)~$300-$500/monthEmbeddings, classification, extraction~$50-$100 in API costs

Break-even: local GPU pays for itself when API embedding + classification costs exceed ~$300-$500/month (roughly Stage 2).

7.6 Spot/Preemptible Instances (Low-Medium Impact)

Agent runtime pods are stateless and restartable → good candidates for spot instances:

Instance typeOn-demandSpotSavings
t3.large (agent nodes)$61/month~$18-$25/month60-70%
m5.large (agent nodes)$70/month~$21-$30/month57-70%

Risk: Spot instances can be reclaimed. Agent tasks must be designed for graceful interruption (they already are — actor model with task retry).

7.7 Prompt Optimization (Low-Medium Impact, Ongoing)

Shorter prompts = fewer tokens = lower cost:

OptimizationToken reductionMonthly savings (50 agents)
Remove verbose instructions10-20% of system prompt~$10-$30
Compress knowledge context20-30% of retrieved knowledge~$15-$40
Use structured output formats10-15% of output tokens~$5-$15

8. Pricing Implications

8.1 Minimum Viable Price

To be margin-positive (>65% gross margin):

Tenant typeCost floorMinimum price (65% margin)Suggested pricing
Small (3 agents, BYOK)~$50~$143$150-$200/month
Small (3 agents, Speedrun keys)~$115~$329$300-$400/month
Medium (5 agents, BYOK)~$90~$257$250-$350/month
Medium (5 agents, Speedrun keys)~$210~$600$500-$700/month
Large (8+ agents, dedicated)~$680~$1,943$1,500-$2,500/month

8.2 Pricing Model Options

ModelProsConsFit for Kaze
Per-agent subscription ($X/agent/month)Predictable revenue, simple to understandDiscourages adding agents, doesn't reflect usageModerate
Per-task pricing ($X/task)Aligns cost with value, scales with usageUnpredictable bills, complex meteringLow (too complex for SMEs)
Tiered subscription (plan tiers with agent/task limits)Predictable for both sides, upgrade path clearMay not fit all usage patternsHigh — recommended
Subscription + usage overage (base fee + per-task over limit)Predictable base + flexible scalingComplexity of two billing dimensionsMedium

Recommendation: Tiered subscription with BYOK discount. Three tiers aligned to cell density:

TierAgentsCell typeBYOK priceSpeedrun-key price
StarterUp to 3Shared$200/month$400/month
GrowthUp to 8Shared$500/month$900/month
EnterpriseUnlimitedDedicated/VPCCustomCustom

8.3 BYOK Impact

BYOK fundamentally changes unit economics:

  • Client pays LLM costs directly → Speedrun's variable cost drops 60-80%
  • Speedrun's cost becomes primarily infrastructure → more predictable, better margins
  • Clients with provider credits/discounts get better rates than Speedrun could offer
  • Risk: Speedrun loses visibility into token spend (mitigated by LLM Gateway tracking regardless of key owner)

Strategy: Default pricing assumes BYOK. Speedrun-key pricing is a premium add-on for clients who don't want to manage their own API keys.


9. Cost Monitoring & Alerts

Metrics to track for financial health:

MetricAlert thresholdAction
LLM cost per task (by skill)>2× baseline for that skillInvestigate — model routing may be wrong, or agent is looping
LLM cost per tenant per day>daily budget (tenant-specific)Hard stop (existing budget enforcement)
Infrastructure cost per tenant>120% of tier allocationReview tenant's agent count and task frequency
External API cost per month>budgetReview API plan, negotiate enterprise tier
Gross margin per tenant<50%Flag for pricing review or optimization push
BYOK vs Speedrun-key ratio<30% BYOKPush BYOK adoption — margins are too thin on Speedrun keys

10. Key Takeaways

  1. LLM tokens are 60-80% of variable cost. Every optimization dollar should go here first.
  2. Most tasks cost $0.02-$0.17 fully loaded. This is cheap enough for high-volume automation.
  3. BYOK clients are dramatically more profitable. Push BYOK as default, Speedrun keys as premium.
  4. Shared cells are cost-effective up to ~20 tenants. Infrastructure cost per tenant drops below $25.
  5. Model selection optimization is the single highest-impact lever. Routing tasks to the cheapest adequate model can cut LLM costs 40-50%.
  6. Embedding costs are negligible. Don't optimize here.
  7. Target gross margin of 65-80% is achievable at all tiers with BYOK. Speedrun-key clients need higher pricing.
  8. Cost scales with usage, not ahead of it. The variable-heavy cost structure means costs grow proportionally with revenue — no cliff edges.