Cost Model & Unit Economics
Research for Project Kaze Companion to scalability-model.md (performance scaling)
1. Cost Structure Overview
Where the money goes, ordered by magnitude:
┌────────────────────────────────────────────────────────┐
│ VARIABLE COSTS (scale with usage) │
│ │
│ ████████████████████████████████ LLM Tokens (60-80%) │
│ ████████ External APIs (10-15%)│
│ ████ Embedding Gen (3-5%) │
│ │
│ SEMI-FIXED COSTS (scale with tenants/cells) │
│ │
│ ██████████████ Compute / K8s (varies)│
│ ████████ Database (varies) │
│ ████ Storage (varies) │
│ │
│ FIXED COSTS (exist regardless) │
│ │
│ ██████ Control plane │
│ ████ CI/CD + Registry │
│ ████ Monitoring base │
│ ██ Vault │
└────────────────────────────────────────────────────────┘Key insight: LLM token cost dominates everything. A 20% reduction in tokens per task saves more money than halving infrastructure costs. Cost optimization strategy should focus overwhelmingly on LLM efficiency.
2. LLM Token Cost Model
2.1 Provider Pricing (as of Feb 2026)
Prices per 1M tokens:
| Provider | Model | Tier | Input | Output | Best for |
|---|---|---|---|---|---|
| Anthropic | Claude Haiku 4.5 | fast | $1.00 | $5.00 | Classification, extraction, simple tasks |
| Anthropic | Claude Sonnet 4.5 | balanced | $3.00 | $15.00 | General reasoning, tool use |
| Anthropic | Claude Opus 4.6 | best | $5.00 | $25.00 | Complex reasoning, quality evaluation |
| OpenAI | GPT-4.1 | balanced | $2.00 | $8.00 | General tasks, good cost/quality ratio |
| OpenAI | GPT-4o | balanced | $2.50 | $10.00 | Multimodal, general tasks |
| Gemini 2.5 Flash-Lite | fast | $0.10 | $0.40 | Cheapest option, bulk processing | |
| Gemini 2.5 Pro | balanced | $1.25 | $10.00 | Strong reasoning, long context | |
| OpenAI | text-embedding-3-small | embed | $0.02 | — | Knowledge embeddings (cheap) |
| OpenAI | text-embedding-3-large | embed | $0.13 | — | Knowledge embeddings (better quality) |
| Local | Ollama (Llama 3, Mistral) | fast | $0* | $0* | Zero marginal cost, limited quality |
*Local model cost is compute-only (GPU hardware amortization), not per-token.
Cost reduction features:
- Prompt caching (Anthropic): 90% discount on repeated context → huge savings for agents with stable system prompts
- Batch API (Anthropic, OpenAI): 50% discount for non-urgent tasks → knowledge consolidation, quality evaluation
- Gemini free tier: 1,000 requests/day → useful for development and low-volume testing
2.2 Tokens per Task (Estimates)
Estimated token usage per task type based on typical agent workflows:
| Task Type | Input tokens | Output tokens | LLM calls | Total tokens |
|---|---|---|---|---|
| Simple extraction (parse invoice, classify ticket) | ~1,500 | ~500 | 1 | ~2,000 |
| Keyword research (SEO) | ~4,000 | ~2,000 | 2-3 | ~10,000 |
| Content optimization (SEO) | ~6,000 | ~3,000 | 2-4 | ~15,000 |
| Data quality check (Toddle) | ~2,000 | ~1,000 | 1-2 | ~5,000 |
| Content enrichment (Toddle) | ~3,000 | ~2,000 | 2-3 | ~8,000 |
| Research synthesis (V0 Internal Ops) | ~8,000 | ~4,000 | 3-5 | ~20,000 |
| Project status update (V0) | ~3,000 | ~1,500 | 1-2 | ~6,000 |
| Technical audit (SEO) | ~10,000 | ~5,000 | 4-6 | ~25,000 |
| Quality evaluation (Layer 3) | ~4,000 | ~1,000 | 1 | ~5,000 |
These exclude system prompt tokens (typically ~1-2K, amortized via prompt caching).
2.3 Cost per Task by Model Tier
Combining tokens per task with provider pricing:
| Task Type | Fast (Haiku) | Balanced (Sonnet) | Best (Opus) | Cheapest (Gemini Flash-Lite) |
|---|---|---|---|---|
| Simple extraction | $0.004 | $0.011 | $0.018 | $0.0004 |
| Keyword research | $0.014 | $0.042 | $0.070 | $0.002 |
| Content optimization | $0.021 | $0.063 | $0.105 | $0.002 |
| Data quality check | $0.007 | $0.021 | $0.035 | $0.001 |
| Content enrichment | $0.013 | $0.039 | $0.065 | $0.001 |
| Research synthesis | $0.028 | $0.084 | $0.140 | $0.003 |
| Technical audit | $0.035 | $0.105 | $0.175 | $0.004 |
| Quality evaluation | $0.009 | $0.027 | $0.045 | $0.001 |
Observation: Even with the most expensive model (Opus), a complex task like a technical audit costs ~$0.18. Most tasks are under $0.10 on balanced models. The cheapest option (Gemini Flash-Lite) brings costs down to fractions of a cent.
2.4 Monthly Cost per Agent
Estimated monthly LLM cost per agent based on task frequency:
| Agent Type | Tasks/day | Avg tokens/task | Model tier | Monthly LLM cost |
|---|---|---|---|---|
| SEO Keyword Research | 5-10 | ~10,000 | balanced | $13-$26 |
| SEO Content Optimization | 3-5 | ~15,000 | balanced | $12-$20 |
| SEO Technical Audit | 1 (weekly) | ~25,000 | balanced | $3 |
| SEO Reporting | 1 (weekly) | ~15,000 | balanced | $2 |
| Toddle Content Enrichment | 20-50 | ~8,000 | fast | $5-$13 |
| Toddle Data Quality | 50-100 | ~5,000 | fast | $8-$15 |
| V0 Research Agent | 2-5 | ~20,000 | balanced | $10-$26 |
| V0 Project Management | 10-20 | ~6,000 | balanced | $8-$16 |
| Quality Monitor (L3) | 20-50 | ~5,000 | best | $13-$34 |
Typical agent: ~$10-30/month in LLM costs on balanced models. High-volume agents (data quality, content enrichment) on fast models stay under $15/month.
2.5 Monthly Cost per Tenant
Based on typical agent deployments per vertical:
| Vertical | Agents per tenant | Monthly LLM cost per tenant |
|---|---|---|
| SEO (full suite) | 4 (keyword + content + audit + reporting) | $30-$50 |
| Toddle | 3 (enrichment + quality + recommendations) | $20-$40 |
| Internal Ops (V0) | 5 (research + PM + scheduling + docs + issues) | $40-$80 |
Note: These are Speedrun's LLM costs. Under the dual-key model (D7), clients with their own API keys offload these costs. BYOK clients have near-zero LLM cost to Speedrun.
2.6 Embedding Costs
Knowledge system embedding costs (using text-embedding-3-small at $0.02/1M tokens):
| Operation | Tokens | Cost |
|---|---|---|
| Embed one knowledge entry (~500 tokens) | 500 | $0.00001 |
| Embed one query (~100 tokens) | 100 | $0.000002 |
| 1,000 knowledge writes/day | 500K | $0.01/day |
| 10,000 queries/day | 1M | $0.02/day |
Verdict: Embedding costs are negligible — under $1/month even at Stage 2. Not a cost concern.
3. Infrastructure Cost per Cell Type
Based on AWS pricing (us-east-1, on-demand). Other clouds are comparable within ~20%.
3.1 Shared Cell (Multi-Tenant)
A shared cell hosts 5-20 tenants with namespace isolation. Infrastructure is amortized.
| Component | Instance/Resource | Monthly cost |
|---|---|---|
| K8s control plane (EKS) | Managed | $73 |
| K8s worker nodes (3×) | t3.large (2 vCPU, 8GB) | $183 |
| PostgreSQL | db.t3.medium (RDS) or self-hosted | $50-$100 |
| PgBouncer | Runs on worker node | $0 (included) |
| Vault | Self-hosted on worker node | $0 (included) |
| Monitoring (Prometheus + Grafana + Loki) | Self-hosted on worker node | $0 (included) |
| Object storage (MinIO or S3) | 100GB | $2-$5 |
| Network (ALB, data transfer) | Moderate | $30-$50 |
| Total shared cell | ~$340-$410/month | |
| Per tenant (10 tenants) | ~$34-$41/month | |
| Per tenant (20 tenants) | ~$17-$21/month |
3.2 Dedicated Cell (Single-Tenant)
Full stack for one client. Same components, no sharing.
| Component | Instance/Resource | Monthly cost |
|---|---|---|
| K8s control plane | Managed | $73 |
| K8s worker nodes (2×) | t3.large | $122 |
| PostgreSQL | db.t3.medium | $50-$100 |
| Vault + Monitoring | Self-hosted | $0 (included) |
| Object storage | 50GB | $1-$3 |
| Network | Low | $20-$30 |
| Total dedicated cell | ~$270-$330/month |
3.3 Customer VPC Cell
Same as dedicated, but deployed in client's cloud. Client pays infrastructure; Speedrun pays ops labor.
| Cost to client | Monthly |
|---|---|
| Infrastructure (same as dedicated) | ~$270-$330 |
| Cost to Speedrun | Monthly |
|---|---|
| Ops overhead (monitoring, updates, incident response) | ~$50-$100 (amortized labor) |
| VPN + health beacon infrastructure | ~$10 |
| Total Speedrun cost per VPC client | ~$60-$110/month |
3.4 Fixed Infrastructure (Exists Regardless of Tenants)
| Component | Monthly cost |
|---|---|
| GitHub (organization plan, CI/CD minutes) | $50-$100 |
| Container registry (GHCR, storage) | $10-$20 |
| DNS + domain | $5-$10 |
| Development/staging environment | $150-$250 |
| Total fixed | ~$215-$380/month |
4. External Service Costs
4.1 SEMrush API
| Tier | Monthly cost | Included API units | Notes |
|---|---|---|---|
| Business plan (required for API) | $500/month | 10,000 requests | Shared across all SEO tenants |
| Additional units | Varies | Contact sales | Scale as client count grows |
At 10 SEO clients, each running 5-10 keyword research tasks/day × 2-3 API calls each = ~1,500-3,000 API calls/month total. Well within the 10,000 included.
Scaling trigger: 20+ active SEO clients likely exceeds the included 10K units → negotiate enterprise API plan or buy additional units.
4.2 Other External APIs
| Service | Cost | Notes |
|---|---|---|
| Google Search Console API | Free | Google rate limits apply (25K queries/day) |
| GitHub API | Free tier: 5,000 requests/hr | Sufficient for V0 agents |
| Google Calendar API | Free tier: 1M requests/day | More than sufficient |
| Toddle DB | Internal — no API cost | Direct database access |
4.3 Total External API Costs
| Stage | Monthly external API cost |
|---|---|
| Stage 0 (MVP) | ~$500 (SEMrush only) |
| Stage 1 (5-10 clients) | ~$500-$700 |
| Stage 2 (20-50 clients) | ~$800-$1,500 (SEMrush scale-up) |
| Stage 3 (100+ clients) | ~$2,000-$5,000 (enterprise API plans) |
5. Unit Economics
5.1 Cost per Task (Fully Loaded)
Including LLM tokens, embedding, compute slice, and tool calls:
| Task Type | LLM cost (balanced) | Embedding | Compute slice* | Tool API | Total cost |
|---|---|---|---|---|---|
| Simple extraction | $0.011 | $0.00001 | $0.002 | $0 | ~$0.013 |
| Keyword research | $0.042 | $0.00002 | $0.005 | $0.05 | ~$0.10 |
| Content optimization | $0.063 | $0.00003 | $0.005 | $0.02 | ~$0.09 |
| Data quality check | $0.021 | $0.00001 | $0.002 | $0 | ~$0.023 |
| Research synthesis | $0.084 | $0.00005 | $0.008 | $0 | ~$0.09 |
| Technical audit | $0.105 | $0.00005 | $0.010 | $0.05 | ~$0.17 |
Compute slice: estimated amortized infrastructure cost per task (total infra / total tasks).
Takeaway: Most tasks cost $0.02-$0.17 fully loaded. Even the most expensive task (technical audit) is under $0.20.
5.2 Cost per Agent per Month
| Agent category | LLM cost | Infra share | Tool APIs | Total/month |
|---|---|---|---|---|
| Low-frequency (weekly tasks) | $2-$5 | $3-$5 | $0-$5 | $5-$15 |
| Medium-frequency (5-10 tasks/day) | $10-$25 | $3-$5 | $5-$15 | $18-$45 |
| High-frequency (50+ tasks/day) | $8-$15 (fast model) | $3-$5 | $0-$5 | $11-$25 |
5.3 Cost per Tenant per Month
| Tenant type | Agents | LLM | Infra (shared cell) | Tools | Total/month |
|---|---|---|---|---|---|
| Small (3 agents, basic vertical) | 3 | $30-$60 | $20-$35 | $0-$20 | $50-$115 |
| Medium (5 agents, full vertical) | 5 | $50-$120 | $25-$40 | $20-$50 | $95-$210 |
| Large (8+ agents, multi-vertical) | 8+ | $100-$250 | $270-$330 (dedicated) | $50-$100 | $420-$680 |
| BYOK client (brings own LLM keys) | 5 | $0* | $25-$40 | $20-$50 | $45-$90 |
BYOK clients pay their own LLM costs directly to providers. Speedrun's cost for these clients is infra + tools only.
5.4 Gross Margin Model
What pricing covers the costs:
| Pricing tier | Revenue/month | Cost/month | Gross margin |
|---|---|---|---|
| Small client @ $300/month | $300 | $50-$115 | 62-83% |
| Medium client @ $600/month | $600 | $95-$210 | 65-84% |
| Large client @ $1,500/month | $1,500 | $420-$680 | 55-72% |
| BYOK medium @ $400/month | $400 | $45-$90 | 78-89% |
Target gross margin: 65-80%. Achievable at all tiers except potentially large clients on dedicated cells with heavy LLM usage (55% is tight). Solutions: push BYOK for large clients, or adjust dedicated cell pricing.
6. Scale Cost Curves
6.1 Total Monthly Platform Cost by Stage
| Cost category | Stage 0 | Stage 1 (10) | Stage 2 (50) | Stage 3 (200) |
|---|---|---|---|---|
| Fixed infra | $300 | $300 | $400 | $600 |
| Cell infra | $350 (1 shared) | $700 (2 shared) | $2,500 (5 shared + 2 dedicated) | $12,000 (10 shared + 10 dedicated + VPCs) |
| LLM tokens | $200 (V0 only) | $500-$1,000 | $3,000-$8,000 | $15,000-$40,000 |
| External APIs | $500 | $600 | $1,200 | $4,000 |
| Embedding | $1 | $5 | $20 | $80 |
| Total | ~$1,350 | ~$2,100-$2,600 | ~$7,100-$12,100 | ~$31,700-$56,700 |
| Per tenant | — | $210-$260 | $142-$242 | $159-$284 |
6.2 Fixed vs Variable Split
| Stage | Fixed costs | Variable costs | Fixed % |
|---|---|---|---|
| Stage 0 | $650 (infra) | $700 (LLM + APIs) | 48% |
| Stage 1 | $1,000 | $1,100-$1,600 | 38-48% |
| Stage 2 | $2,900 | $4,200-$9,200 | 24-41% |
| Stage 3 | $12,600 | $19,100-$44,100 | 22-40% |
Pattern: As scale increases, LLM tokens dominate and the cost structure shifts heavily toward variable costs. This is good — costs scale with revenue (usage-based), not ahead of it.
6.3 Economies of Scale
| What gets cheaper per unit | Why |
|---|---|
| Infrastructure amortization | More tenants per shared cell = lower per-tenant infra cost |
| Fixed costs per tenant | CI/CD, registry, dev environment amortized over more tenants |
| SEMrush and tool APIs | Enterprise plans offer better unit rates at higher volumes |
| Monitoring overhead | Shared monitoring stack scales sublinearly |
| What stays proportional | Why |
|---|---|
| LLM tokens per task | Same task = same tokens, regardless of scale |
| Embedding costs per write | Same write = same embedding, regardless of scale |
| Tool API calls per task | Same task = same API calls |
| What gets cheaper with optimization (not scale) | Why |
|---|---|
| LLM cost per task | Model selection optimization, prompt caching, prompt shortening |
| Knowledge query cost | Query caching, better retrieval (fewer irrelevant results) |
7. Cost Optimization Levers
Ordered by impact:
7.1 Model Selection Optimization (High Impact)
Route each task to the cheapest model that meets the quality bar:
| Strategy | Savings estimate | Implementation |
|---|---|---|
Use fast for extraction/classification | 60-80% vs balanced | LLM Gateway model hint → tenant config mapping |
Use balanced for reasoning, fast for everything else | 40-50% overall | Quality monitor evaluates if fast model is sufficient per skill |
| Use Gemini Flash-Lite for bulk processing | 90%+ vs Haiku | Batch API + cheapest model for background tasks |
Example: An SEO agent running keyword research:
- Old: all calls on Sonnet → $0.042/task
- Optimized: tool parsing on Haiku, reasoning on Sonnet → ~$0.025/task (40% savings)
7.2 Prompt Caching (High Impact)
Anthropic prompt caching: 90% discount on repeated context (system prompt, skill definitions, loaded knowledge).
| Agent type | Cacheable context | Savings per task |
|---|---|---|
| Any agent with stable system prompt | ~1-2K tokens (system + skill) | ~$0.003-$0.006 saved per task |
| Agent with pre-loaded knowledge | ~3-5K tokens (knowledge context) | ~$0.009-$0.015 saved per task |
At 1,000 tasks/day, prompt caching saves ~$90-$180/month.
7.3 Client BYOK (High Impact on Margins)
Clients who bring their own LLM keys eliminate Speedrun's largest variable cost:
| Scenario | Speedrun's LLM cost | Gross margin impact |
|---|---|---|
| Speedrun pays all LLM | $50-$120/tenant/month | 65-80% margin |
| Client BYOK | $0/tenant | 78-89% margin |
| Hybrid (client key preferred, Speedrun fallback) | $10-$30/tenant/month | 75-85% margin |
Strategy: Default to client BYOK. Speedrun keys as fallback only. Pricing reflects this — BYOK clients get a lower base price, Speedrun-key clients pay a premium.
7.4 Batch API for Background Tasks (Medium Impact)
50% discount for non-urgent tasks:
| Task category | Eligible for batch | Monthly savings (50 agents) |
|---|---|---|
| Knowledge consolidation | Yes | ~$20-$40 |
| Quality evaluation | Yes | ~$15-$30 |
| Scheduled reports | Yes | ~$5-$10 |
| Conversation responses | No (latency-sensitive) | — |
7.5 Local Model Overflow (Medium Impact)
Self-hosted models via Ollama/vLLM for non-critical tasks:
| Component | Hardware cost | What it handles | Monthly savings |
|---|---|---|---|
| 1× GPU instance (A10G) | ~$300-$500/month | Embeddings, classification, extraction | ~$50-$100 in API costs |
Break-even: local GPU pays for itself when API embedding + classification costs exceed ~$300-$500/month (roughly Stage 2).
7.6 Spot/Preemptible Instances (Low-Medium Impact)
Agent runtime pods are stateless and restartable → good candidates for spot instances:
| Instance type | On-demand | Spot | Savings |
|---|---|---|---|
| t3.large (agent nodes) | $61/month | ~$18-$25/month | 60-70% |
| m5.large (agent nodes) | $70/month | ~$21-$30/month | 57-70% |
Risk: Spot instances can be reclaimed. Agent tasks must be designed for graceful interruption (they already are — actor model with task retry).
7.7 Prompt Optimization (Low-Medium Impact, Ongoing)
Shorter prompts = fewer tokens = lower cost:
| Optimization | Token reduction | Monthly savings (50 agents) |
|---|---|---|
| Remove verbose instructions | 10-20% of system prompt | ~$10-$30 |
| Compress knowledge context | 20-30% of retrieved knowledge | ~$15-$40 |
| Use structured output formats | 10-15% of output tokens | ~$5-$15 |
8. Pricing Implications
8.1 Minimum Viable Price
To be margin-positive (>65% gross margin):
| Tenant type | Cost floor | Minimum price (65% margin) | Suggested pricing |
|---|---|---|---|
| Small (3 agents, BYOK) | ~$50 | ~$143 | $150-$200/month |
| Small (3 agents, Speedrun keys) | ~$115 | ~$329 | $300-$400/month |
| Medium (5 agents, BYOK) | ~$90 | ~$257 | $250-$350/month |
| Medium (5 agents, Speedrun keys) | ~$210 | ~$600 | $500-$700/month |
| Large (8+ agents, dedicated) | ~$680 | ~$1,943 | $1,500-$2,500/month |
8.2 Pricing Model Options
| Model | Pros | Cons | Fit for Kaze |
|---|---|---|---|
| Per-agent subscription ($X/agent/month) | Predictable revenue, simple to understand | Discourages adding agents, doesn't reflect usage | Moderate |
| Per-task pricing ($X/task) | Aligns cost with value, scales with usage | Unpredictable bills, complex metering | Low (too complex for SMEs) |
| Tiered subscription (plan tiers with agent/task limits) | Predictable for both sides, upgrade path clear | May not fit all usage patterns | High — recommended |
| Subscription + usage overage (base fee + per-task over limit) | Predictable base + flexible scaling | Complexity of two billing dimensions | Medium |
Recommendation: Tiered subscription with BYOK discount. Three tiers aligned to cell density:
| Tier | Agents | Cell type | BYOK price | Speedrun-key price |
|---|---|---|---|---|
| Starter | Up to 3 | Shared | $200/month | $400/month |
| Growth | Up to 8 | Shared | $500/month | $900/month |
| Enterprise | Unlimited | Dedicated/VPC | Custom | Custom |
8.3 BYOK Impact
BYOK fundamentally changes unit economics:
- Client pays LLM costs directly → Speedrun's variable cost drops 60-80%
- Speedrun's cost becomes primarily infrastructure → more predictable, better margins
- Clients with provider credits/discounts get better rates than Speedrun could offer
- Risk: Speedrun loses visibility into token spend (mitigated by LLM Gateway tracking regardless of key owner)
Strategy: Default pricing assumes BYOK. Speedrun-key pricing is a premium add-on for clients who don't want to manage their own API keys.
9. Cost Monitoring & Alerts
Metrics to track for financial health:
| Metric | Alert threshold | Action |
|---|---|---|
| LLM cost per task (by skill) | >2× baseline for that skill | Investigate — model routing may be wrong, or agent is looping |
| LLM cost per tenant per day | >daily budget (tenant-specific) | Hard stop (existing budget enforcement) |
| Infrastructure cost per tenant | >120% of tier allocation | Review tenant's agent count and task frequency |
| External API cost per month | >budget | Review API plan, negotiate enterprise tier |
| Gross margin per tenant | <50% | Flag for pricing review or optimization push |
| BYOK vs Speedrun-key ratio | <30% BYOK | Push BYOK adoption — margins are too thin on Speedrun keys |
10. Key Takeaways
- LLM tokens are 60-80% of variable cost. Every optimization dollar should go here first.
- Most tasks cost $0.02-$0.17 fully loaded. This is cheap enough for high-volume automation.
- BYOK clients are dramatically more profitable. Push BYOK as default, Speedrun keys as premium.
- Shared cells are cost-effective up to ~20 tenants. Infrastructure cost per tenant drops below $25.
- Model selection optimization is the single highest-impact lever. Routing tasks to the cheapest adequate model can cut LLM costs 40-50%.
- Embedding costs are negligible. Don't optimize here.
- Target gross margin of 65-80% is achievable at all tiers with BYOK. Speedrun-key clients need higher pricing.
- Cost scales with usage, not ahead of it. The variable-heavy cost structure means costs grow proportionally with revenue — no cliff edges.