Kaze Dashboard — UI Research & Proposal
Internal ops dashboard for Speedrun team to monitor, supervise, and configure the Kaze agent platform. Not a client-facing UI — SME clients interact via Slack/WhatsApp/Email (D15).
1. Users & Access Model
| Role | Access | Primary Use |
|---|---|---|
| Platform Admin | Full access | System config, agent definitions, credential status, deployment |
| Ops Lead | All verticals | Supervision queue, quality monitoring, budget oversight |
| Vertical Operator | Scoped to vertical(s) | Task monitoring, supervision review, knowledge curation |
Access control follows existing ABAC model. Dashboard authenticates via the same identity provider used for Speedrun internal tools.
2. Information Architecture
Dashboard
├── Overview (home)
├── Agents
│ ├── Fleet Status
│ └── Agent Detail → tasks, skills, supervision state, logs
├── Supervision
│ ├── Review Queue
│ └── Ramp Management
├── Tasks
│ ├── Active / Recent
│ └── Task Detail → execution trace, tool calls, LLM interactions
├── Knowledge
│ ├── Shared Tier (quarantine, verified)
│ ├── Per-Agent Memory
│ └── Ingestion Status
├── Budget & Cost
│ ├── Token Usage
│ ├── Per-Tenant Breakdown
│ └── Alerts
├── Logs & Traces
│ ├── Observation Explorer (18 event types)
│ └── Langfuse Deep Link
└── Settings
├── Vertical Definitions
├── Skill Config
├── Tool Integrations
└── Model Routing (D35 hints)3. Key Views
3.1 Overview (Home)
Single-screen operational pulse. Answers: "Is everything healthy right now?"
Layout: Top metric cards + two-column body (left: activity feed, right: attention items).
Metric cards (top row):
- Active agents (by state: idle / executing / waiting / error)
- Tasks last 24h (completed / failed / escalated)
- Supervision queue depth (items awaiting review)
- Token spend today vs. budget cap
- Mean task latency (p50, p95)
Activity feed (left):
- Real-time stream of significant events (task completions, escalations, errors, supervision decisions, budget warnings).
- Filterable by vertical, severity, event type.
- Each entry links to its detail view.
Attention items (right):
- Tasks stuck or failed (> threshold duration or retry count)
- Supervision items waiting > SLA
- Budget utilization > 80%
- Agent errors or unhealthy states
- Knowledge quarantine items pending review
3.2 Agent Fleet
List view: Table of running VerticalAgent instances.
| Column | Source |
|---|---|
| Vertical | vertical.yaml id |
| Tenant | tenantId |
| State | idle / executing / waiting / error |
| Current Task | taskId or "—" |
| Skills | count, with supervision level badges |
| Uptime | since spawn |
| Tasks (24h) | completed / failed |
Agent detail view:
- Identity: vertical, tenant, spawn time, model hint
- Skill cards: each skill shows supervision level (supervised/sampling/autonomous), success rate, promotion threshold progress bar
- Active task: real-time execution trace (LLM calls, tool calls, knowledge queries)
- Recent tasks: paginated history with status, duration, token cost
- Memory: per-agent episodic memory entries (from Mem0), searchable
3.3 Supervision Queue (addresses Q6)
The critical operational view. This is where human-in-the-loop happens.
Queue list:
- Sorted by: age (oldest first), priority, vertical
- Each item shows: task summary, agent, skill, vertical, client, timestamp, risk level
- Batch actions: approve all (filtered), reject selected
Review panel (split-pane or slide-over):
- Left: Agent's proposed output (markdown rendered)
- Right: Context panel
- Task input (what was requested)
- Execution trace (collapsed, expandable)
- Knowledge sources cited
- Similar past approvals/rejections
- Action bar:
- Approve (deliver to client)
- Approve with edit (inline edit before delivery)
- Reject with feedback (text input — feeds back into skill improvement)
- Escalate (assign to another team member)
- Keyboard shortcuts:
aapprove,eedit,rreject,j/knext/prev — ops will use this heavily
Supervision ramp management:
- Per-skill promotion/demotion controls
- Current level + threshold progress
- Manual override (promote/demote with reason, logged)
- Promotion criteria: configurable success rate % over N recent tasks
3.4 Task Explorer
List view: Filterable by vertical, agent, status, date range, skill.
Task detail:
- Status timeline: dispatched → executing → (escalated?) → completed/failed
- Full execution trace (observation events):
- LLM calls: prompt (truncated), response, model, tokens, latency, cost
- Tool calls: tool name, args, result, duration
- Knowledge operations: search queries, results, commits
- Supervision events: review decision, reviewer, feedback
- Token breakdown: per-step cost, cumulative
- Deep link to Langfuse trace (D48)
3.5 Knowledge Management
Shared knowledge tier:
- Entries in quarantine (D40): source, content preview, quality signals (LLM-as-judge score, cross-reference count, source verification status)
- Verify / reject actions
- Verified entries: searchable, filterable by domain, provenance class (D43)
Ingestion status:
- Seed skill runs: repos processed, files ingested, errors
- Last ingestion time per repo
- Trigger manual re-ingestion
Per-agent memory:
- Browse episodic memory entries per agent
- Search across all agent memories
- Delete/archive stale entries
3.6 Budget & Cost
Token usage dashboard:
- Time-series chart: daily token usage by provider/model
- Per-tenant breakdown: table with usage, cost, budget cap, utilization %
- Per-vertical breakdown: which verticals consume most
- Per-skill breakdown: which skills are most expensive
- Model selection distribution: pie chart of fast/balanced/best usage
Alerts:
- Budget threshold alerts (configurable: 50%, 80%, 90%, 100%)
- Anomaly detection: unusual spike in usage for a tenant/agent
- Cost per task trending up (possible prompt regression)
3.7 Logs & Traces
Observation explorer:
- Full-text search across all 18 observation event types
- Filter by: event type, vertical, agent, tenant, time range, severity
- Structured view with expandable JSON payloads
- Link to related task detail
Langfuse integration:
- Embed or deep-link to Langfuse trace views
- Use Langfuse for prompt management and A/B testing visualization
3.8 Settings
Vertical & skill config:
- View loaded vertical definitions (from YAML, read-only display)
- Skill catalog: all skills across verticals, their schemas, supervision defaults
- Tool integration status: which tools are registered in gateway, credential health (valid/expired/missing — no secrets shown)
Model routing (D35):
- Current model mapping per hint level (fast/balanced/best)
- Per-tenant overrides
- Provider fallback chain status
4. Tech Stack Recommendation
Option A: Next.js (App Router) — Recommended
| Layer | Choice | Rationale |
|---|---|---|
| Framework | Next.js 15 (App Router) | SSR for initial load, RSC for data-heavy views, API routes for BFF |
| UI Library | shadcn/ui + Tailwind | Composable, accessible, matches Kaze brand (dark theme, Inter/JetBrains Mono) |
| State | TanStack Query | Server state caching, real-time refetch, optimistic updates |
| Real-time | SSE from runtime/gateway | Lightweight, no WebSocket infra needed for MVP |
| Charts | Recharts or Tremor | Token usage, cost trends, quality metrics |
| Tables | TanStack Table | Sortable, filterable, paginated — core of fleet/task/queue views |
| Auth | NextAuth.js or Lucia | Session-based, supports OAuth (GitHub, Google) |
| Deploy | Container (same K8s cluster) | Co-located with runtime/gateway, internal network only |
Why Next.js over alternatives:
- TypeScript-native (consistent with runtime/gateway)
- RSC reduces client bundle for data-heavy dashboard
- API routes serve as BFF — aggregates calls to runtime (4100), gateway (4200), knowledge (4300)
- Vercel-optional: runs as standalone Node.js container
Option B: Vite + React SPA
Simpler setup, faster dev iteration. No SSR benefits. Would need a separate BFF or hit services directly. Better if the dashboard stays very simple and internal-only.
Option C: Vue 3 + Nuxt
Viable if team prefers Vue. Similar capabilities to Next.js. Less ecosystem momentum for dashboard component libraries.
Recommendation: Option A for the full dashboard vision. Option B if we want to ship a minimal ops console fast and iterate.
5. Data Flow
The dashboard does not have its own database. It reads from existing services:
┌─────────────────────┐
│ Kaze Dashboard │
│ (Next.js BFF) │
└──┬──────┬───────┬───┘
│ │ │
▼ ▼ ▼
Runtime Gateway Knowledge
:4100 :4200 :4300New API surface needed on existing services:
| Service | New Endpoints | Purpose |
|---|---|---|
| Runtime | GET /agents (exists) | Fleet status |
| Runtime | GET /tasks (new) | Task history, paginated |
| Runtime | GET /tasks/:id/trace (new) | Full observation trace for a task |
| Runtime | GET /supervision/queue (new) | Pending review items |
| Runtime | POST /supervision/review (new) | Submit review decision |
| Runtime | GET /supervision/ramp (new) | Current supervision levels |
| Runtime | POST /supervision/ramp (new) | Manual promote/demote |
| Gateway | GET /usage (new) | Token usage aggregates |
| Gateway | GET /tools/catalog (exists) | Tool health |
| Knowledge | GET /entries (new) | Browse knowledge entries |
| Knowledge | GET /quarantine (new) | Quarantine queue |
| Knowledge | POST /quarantine/:id/verify (new) | Approve quarantine entry |
Real-time updates:
- Runtime exposes SSE endpoint
GET /events/stream— pushes observation events as they fire - Dashboard subscribes on connect, filters client-side by vertical/agent
- Reconnect with
Last-Event-IDfor gap recovery
6. Design System Alignment
Per design/brand.md:
| Element | Spec |
|---|---|
| Background | Charcoal #171717 (dark-first) |
| Primary accent | Pale Sage #A7F3D0 |
| Secondary accent | Mint #6EE7B7 |
| Text primary | #F5F5F5 |
| Text secondary | #A3A3A3 |
| Headings | Bold sans-serif (system or custom) |
| Body text | Inter |
| Code / data | JetBrains Mono |
| Borders | #262626 subtle |
| Cards | #1C1C1C with #262626 border |
| Status: healthy | #A7F3D0 (sage) |
| Status: warning | #FDE68A (amber) |
| Status: error | #FCA5A5 (rose) |
| Status: idle | #A3A3A3 (neutral) |
Visual tone: Clean, data-dense, minimal chrome. "Zen-Infrastructure" — precision meets calm. No gratuitous animations. Information density over decoration.
7. MVP Scope (Phase 1 Dashboard)
Ship a useful ops tool fast, expand later.
Phase 1 — Ops Console (4-6 weeks)
Build with Option A (Next.js) or Option B (Vite SPA) depending on speed priority.
In scope:
- Overview page (metric cards + activity feed)
- Agent fleet status (list + basic detail)
- Task explorer (list + trace detail, Langfuse deep link)
- Supervision queue (list + review panel with approve/reject)
- Basic token usage view (per-tenant, per-day)
Out of scope (Phase 2+):
- Knowledge management UI (use direct API / Langfuse for now)
- Budget alerts and anomaly detection
- Settings / config management (keep using YAML files)
- Supervision ramp management UI (use API directly)
- Real-time SSE (use polling with TanStack Query refetch intervals)
- Advanced ABAC (single admin role for MVP)
Phase 2 — Full Dashboard
- Knowledge quarantine review
- Supervision ramp management
- Budget alerts and anomaly detection
- Real-time SSE streaming
- Role-based access (admin, ops lead, operator)
- Settings UI for vertical/skill/model config
- Multi-channel conversation context view (Q5)
Phase 3 — Governance Layer
- Layer 3 agent dashboards (Health Monitor, Cost Monitor, Quality Monitor)
- Self-improvement loop visualization
- Cross-vertical analytics
- Client portal (if/when needed)
8. Open Questions for Dashboard
| # | Question | Impact | Notes |
|---|---|---|---|
| DQ1 | Where does the dashboard live as a repo? New repo (kaze-dashboard) or inside kaze-runtime? | Architecture | Recommend: separate repo, keeps runtime lean |
| DQ2 | Auth provider? GitHub OAuth (team already uses GH) vs. Google OAuth vs. custom? | Security | GitHub OAuth is simplest for internal tool |
| DQ3 | Do we need offline/degraded mode if runtime is down? | Reliability | Probably not for internal tool — show error state |
| DQ4 | Should observation events be persisted in a queryable store (Postgres) or only in Langfuse? | Data arch | Langfuse for MVP; own Postgres table for Phase 2 if query patterns diverge |
| DQ5 | Mobile responsiveness needed? | Design | Probably not — ops team uses desktop. Tablet-friendly is nice-to-have |
| DQ6 | Should the BFF aggregate cross-service data or let the client call services directly? | Architecture | BFF preferred — single auth boundary, aggregation, caching |
9. Alternatives Considered
Grafana + Custom Panels
- Pros: Fast setup, good for metrics/logs, already used in many ops teams
- Cons: Poor fit for supervision queue (interactive workflow), limited custom UI, separate auth, not branded
- Verdict: Use Langfuse for trace visualization, build custom for interactive workflows
Retool / Internal Tool Builders
- Pros: Very fast to prototype, drag-and-drop
- Cons: Vendor lock-in, limited customization, recurring cost, can't embed in K8s easily, not branded
- Verdict: Skip — we need custom supervision UX and want to own the experience
Admin.js / React-Admin
- Pros: Pre-built CRUD patterns, fast for data management
- Cons: Designed for database-backed CRUD, awkward fit for service-aggregated data, limited real-time
- Verdict: Could use for Settings section only, but not worth the dependency for one page
10. Summary
The Kaze Dashboard is an internal ops console focused on three core workflows:
- Monitor — Are agents healthy? Are tasks succeeding? Is spend on track?
- Supervise — Review agent outputs, approve/reject/edit, manage trust ramp.
- Investigate — When something goes wrong, trace the full execution path.
It reads from existing services (runtime, gateway, knowledge) via a BFF layer, requires minimal new API surface, and aligns with the Kaze brand system. MVP ships the monitoring and supervision workflows; knowledge management and governance views come later.