Kaze Dashboard — UI Research & Proposal

Internal ops dashboard for Speedrun team to monitor, supervise, and configure the Kaze agent platform. Not a client-facing UI — SME clients interact via Slack/WhatsApp/Email (D15).

1. Users & Access Model

Role	Access	Primary Use
Platform Admin	Full access	System config, agent definitions, credential status, deployment
Ops Lead	All verticals	Supervision queue, quality monitoring, budget oversight
Vertical Operator	Scoped to vertical(s)	Task monitoring, supervision review, knowledge curation

Access control follows existing ABAC model. Dashboard authenticates via the same identity provider used for Speedrun internal tools.

2. Information Architecture

Dashboard
├── Overview (home)
├── Agents
│   ├── Fleet Status
│   └── Agent Detail → tasks, skills, supervision state, logs
├── Supervision
│   ├── Review Queue
│   └── Ramp Management
├── Tasks
│   ├── Active / Recent
│   └── Task Detail → execution trace, tool calls, LLM interactions
├── Knowledge
│   ├── Shared Tier (quarantine, verified)
│   ├── Per-Agent Memory
│   └── Ingestion Status
├── Budget & Cost
│   ├── Token Usage
│   ├── Per-Tenant Breakdown
│   └── Alerts
├── Logs & Traces
│   ├── Observation Explorer (18 event types)
│   └── Langfuse Deep Link
└── Settings
    ├── Vertical Definitions
    ├── Skill Config
    ├── Tool Integrations
    └── Model Routing (D35 hints)

3. Key Views

3.1 Overview (Home)

Single-screen operational pulse. Answers: "Is everything healthy right now?"

Layout: Top metric cards + two-column body (left: activity feed, right: attention items).

Metric cards (top row):

Active agents (by state: idle / executing / waiting / error)
Tasks last 24h (completed / failed / escalated)
Supervision queue depth (items awaiting review)
Token spend today vs. budget cap
Mean task latency (p50, p95)

Activity feed (left):

Real-time stream of significant events (task completions, escalations, errors, supervision decisions, budget warnings).
Filterable by vertical, severity, event type.
Each entry links to its detail view.

Attention items (right):

Tasks stuck or failed (> threshold duration or retry count)
Supervision items waiting > SLA
Budget utilization > 80%
Agent errors or unhealthy states
Knowledge quarantine items pending review

3.2 Agent Fleet

List view: Table of running VerticalAgent instances.

Column	Source
Vertical	vertical.yaml `id`
Tenant	tenantId
State	idle / executing / waiting / error
Current Task	taskId or "—"
Skills	count, with supervision level badges
Uptime	since spawn
Tasks (24h)	completed / failed

Agent detail view:

Identity: vertical, tenant, spawn time, model hint
Skill cards: each skill shows supervision level (supervised/sampling/autonomous), success rate, promotion threshold progress bar
Active task: real-time execution trace (LLM calls, tool calls, knowledge queries)
Recent tasks: paginated history with status, duration, token cost
Memory: per-agent episodic memory entries (from Mem0), searchable

3.3 Supervision Queue (addresses Q6)

The critical operational view. This is where human-in-the-loop happens.

Queue list:

Sorted by: age (oldest first), priority, vertical
Each item shows: task summary, agent, skill, vertical, client, timestamp, risk level
Batch actions: approve all (filtered), reject selected

Review panel (split-pane or slide-over):

Left: Agent's proposed output (markdown rendered)
Right: Context panel
- Task input (what was requested)
- Execution trace (collapsed, expandable)
- Knowledge sources cited
- Similar past approvals/rejections
Action bar:
- Approve (deliver to client)
- Approve with edit (inline edit before delivery)
- Reject with feedback (text input — feeds back into skill improvement)
- Escalate (assign to another team member)
Keyboard shortcuts: a approve, e edit, r reject, j/k next/prev — ops will use this heavily

Supervision ramp management:

Per-skill promotion/demotion controls
Current level + threshold progress
Manual override (promote/demote with reason, logged)
Promotion criteria: configurable success rate % over N recent tasks

3.4 Task Explorer

List view: Filterable by vertical, agent, status, date range, skill.

Task detail:

Status timeline: dispatched → executing → (escalated?) → completed/failed
Full execution trace (observation events):
- LLM calls: prompt (truncated), response, model, tokens, latency, cost
- Tool calls: tool name, args, result, duration
- Knowledge operations: search queries, results, commits
- Supervision events: review decision, reviewer, feedback
Token breakdown: per-step cost, cumulative
Deep link to Langfuse trace (D48)

3.5 Knowledge Management

Shared knowledge tier:

Entries in quarantine (D40): source, content preview, quality signals (LLM-as-judge score, cross-reference count, source verification status)
Verify / reject actions
Verified entries: searchable, filterable by domain, provenance class (D43)

Ingestion status:

Seed skill runs: repos processed, files ingested, errors
Last ingestion time per repo
Trigger manual re-ingestion

Per-agent memory:

Browse episodic memory entries per agent
Search across all agent memories
Delete/archive stale entries

3.6 Budget & Cost

Token usage dashboard:

Time-series chart: daily token usage by provider/model
Per-tenant breakdown: table with usage, cost, budget cap, utilization %
Per-vertical breakdown: which verticals consume most
Per-skill breakdown: which skills are most expensive
Model selection distribution: pie chart of fast/balanced/best usage

Alerts:

Budget threshold alerts (configurable: 50%, 80%, 90%, 100%)
Anomaly detection: unusual spike in usage for a tenant/agent
Cost per task trending up (possible prompt regression)

3.7 Logs & Traces

Observation explorer:

Full-text search across all 18 observation event types
Filter by: event type, vertical, agent, tenant, time range, severity
Structured view with expandable JSON payloads
Link to related task detail

Langfuse integration:

Embed or deep-link to Langfuse trace views
Use Langfuse for prompt management and A/B testing visualization

3.8 Settings

Vertical & skill config:

View loaded vertical definitions (from YAML, read-only display)
Skill catalog: all skills across verticals, their schemas, supervision defaults
Tool integration status: which tools are registered in gateway, credential health (valid/expired/missing — no secrets shown)

Model routing (D35):

Current model mapping per hint level (fast/balanced/best)
Per-tenant overrides
Provider fallback chain status

4. Tech Stack Recommendation

Option A: Next.js (App Router) — Recommended

Layer	Choice	Rationale
Framework	Next.js 15 (App Router)	SSR for initial load, RSC for data-heavy views, API routes for BFF
UI Library	shadcn/ui + Tailwind	Composable, accessible, matches Kaze brand (dark theme, Inter/JetBrains Mono)
State	TanStack Query	Server state caching, real-time refetch, optimistic updates
Real-time	SSE from runtime/gateway	Lightweight, no WebSocket infra needed for MVP
Charts	Recharts or Tremor	Token usage, cost trends, quality metrics
Tables	TanStack Table	Sortable, filterable, paginated — core of fleet/task/queue views
Auth	NextAuth.js or Lucia	Session-based, supports OAuth (GitHub, Google)
Deploy	Container (same K8s cluster)	Co-located with runtime/gateway, internal network only

Why Next.js over alternatives:

TypeScript-native (consistent with runtime/gateway)
RSC reduces client bundle for data-heavy dashboard
API routes serve as BFF — aggregates calls to runtime (4100), gateway (4200), knowledge (4300)
Vercel-optional: runs as standalone Node.js container

Option B: Vite + React SPA

Simpler setup, faster dev iteration. No SSR benefits. Would need a separate BFF or hit services directly. Better if the dashboard stays very simple and internal-only.

Option C: Vue 3 + Nuxt

Viable if team prefers Vue. Similar capabilities to Next.js. Less ecosystem momentum for dashboard component libraries.

Recommendation: Option A for the full dashboard vision. Option B if we want to ship a minimal ops console fast and iterate.

5. Data Flow

The dashboard does not have its own database. It reads from existing services:

┌─────────────────────┐
│   Kaze Dashboard    │
│   (Next.js BFF)     │
└──┬──────┬───────┬───┘
   │      │       │
   ▼      ▼       ▼
Runtime  Gateway  Knowledge
:4100    :4200    :4300

New API surface needed on existing services:

Service	New Endpoints	Purpose
Runtime	`GET /agents` (exists)	Fleet status
Runtime	`GET /tasks` (new)	Task history, paginated
Runtime	`GET /tasks/:id/trace` (new)	Full observation trace for a task
Runtime	`GET /supervision/queue` (new)	Pending review items
Runtime	`POST /supervision/review` (new)	Submit review decision
Runtime	`GET /supervision/ramp` (new)	Current supervision levels
Runtime	`POST /supervision/ramp` (new)	Manual promote/demote
Gateway	`GET /usage` (new)	Token usage aggregates
Gateway	`GET /tools/catalog` (exists)	Tool health
Knowledge	`GET /entries` (new)	Browse knowledge entries
Knowledge	`GET /quarantine` (new)	Quarantine queue
Knowledge	`POST /quarantine/:id/verify` (new)	Approve quarantine entry

Real-time updates:

Runtime exposes SSE endpoint GET /events/stream — pushes observation events as they fire
Dashboard subscribes on connect, filters client-side by vertical/agent
Reconnect with Last-Event-ID for gap recovery

6. Design System Alignment

Per design/brand.md:

Element	Spec
Background	Charcoal `#171717` (dark-first)
Primary accent	Pale Sage `#A7F3D0`
Secondary accent	Mint `#6EE7B7`
Text primary	`#F5F5F5`
Text secondary	`#A3A3A3`
Headings	Bold sans-serif (system or custom)
Body text	Inter
Code / data	JetBrains Mono
Borders	`#262626` subtle
Cards	`#1C1C1C` with `#262626` border
Status: healthy	`#A7F3D0` (sage)
Status: warning	`#FDE68A` (amber)
Status: error	`#FCA5A5` (rose)
Status: idle	`#A3A3A3` (neutral)

Visual tone: Clean, data-dense, minimal chrome. "Zen-Infrastructure" — precision meets calm. No gratuitous animations. Information density over decoration.

7. MVP Scope (Phase 1 Dashboard)

Ship a useful ops tool fast, expand later.

Phase 1 — Ops Console (4-6 weeks)

Build with Option A (Next.js) or Option B (Vite SPA) depending on speed priority.

In scope:

Overview page (metric cards + activity feed)
Agent fleet status (list + basic detail)
Task explorer (list + trace detail, Langfuse deep link)
Supervision queue (list + review panel with approve/reject)
Basic token usage view (per-tenant, per-day)

Out of scope (Phase 2+):

Knowledge management UI (use direct API / Langfuse for now)
Budget alerts and anomaly detection
Settings / config management (keep using YAML files)
Supervision ramp management UI (use API directly)
Real-time SSE (use polling with TanStack Query refetch intervals)
Advanced ABAC (single admin role for MVP)

Phase 2 — Full Dashboard

Knowledge quarantine review
Supervision ramp management
Budget alerts and anomaly detection
Real-time SSE streaming
Role-based access (admin, ops lead, operator)
Settings UI for vertical/skill/model config
Multi-channel conversation context view (Q5)

Phase 3 — Governance Layer

Layer 3 agent dashboards (Health Monitor, Cost Monitor, Quality Monitor)
Self-improvement loop visualization
Cross-vertical analytics
Client portal (if/when needed)

8. Open Questions for Dashboard

#	Question	Impact	Notes
DQ1	Where does the dashboard live as a repo? New repo (`kaze-dashboard`) or inside `kaze-runtime`?	Architecture	Recommend: separate repo, keeps runtime lean
DQ2	Auth provider? GitHub OAuth (team already uses GH) vs. Google OAuth vs. custom?	Security	GitHub OAuth is simplest for internal tool
DQ3	Do we need offline/degraded mode if runtime is down?	Reliability	Probably not for internal tool — show error state
DQ4	Should observation events be persisted in a queryable store (Postgres) or only in Langfuse?	Data arch	Langfuse for MVP; own Postgres table for Phase 2 if query patterns diverge
DQ5	Mobile responsiveness needed?	Design	Probably not — ops team uses desktop. Tablet-friendly is nice-to-have
DQ6	Should the BFF aggregate cross-service data or let the client call services directly?	Architecture	BFF preferred — single auth boundary, aggregation, caching

9. Alternatives Considered

Grafana + Custom Panels

Pros: Fast setup, good for metrics/logs, already used in many ops teams
Cons: Poor fit for supervision queue (interactive workflow), limited custom UI, separate auth, not branded
Verdict: Use Langfuse for trace visualization, build custom for interactive workflows

Retool / Internal Tool Builders

Pros: Very fast to prototype, drag-and-drop
Cons: Vendor lock-in, limited customization, recurring cost, can't embed in K8s easily, not branded
Verdict: Skip — we need custom supervision UX and want to own the experience

Admin.js / React-Admin

Pros: Pre-built CRUD patterns, fast for data management
Cons: Designed for database-backed CRUD, awkward fit for service-aggregated data, limited real-time
Verdict: Could use for Settings section only, but not worth the dependency for one page

10. Summary

The Kaze Dashboard is an internal ops console focused on three core workflows:

Monitor — Are agents healthy? Are tasks succeeding? Is spend on track?
Supervise — Review agent outputs, approve/reject/edit, manage trust ramp.
Investigate — When something goes wrong, trace the full execution path.

It reads from existing services (runtime, gateway, knowledge) via a BFF layer, requires minimal new API surface, and aligns with the Kaze brand system. MVP ships the monitoring and supervision workflows; knowledge management and governance views come later.

Kaze Dashboard — UI Research & Proposal ​

1. Users & Access Model ​

2. Information Architecture ​

3. Key Views ​

3.1 Overview (Home) ​

3.2 Agent Fleet ​

3.3 Supervision Queue (addresses Q6) ​

3.4 Task Explorer ​

3.5 Knowledge Management ​

3.6 Budget & Cost ​

3.7 Logs & Traces ​

3.8 Settings ​

4. Tech Stack Recommendation ​

Option A: Next.js (App Router) — Recommended ​

Option B: Vite + React SPA ​

Option C: Vue 3 + Nuxt ​

5. Data Flow ​

6. Design System Alignment ​

7. MVP Scope (Phase 1 Dashboard) ​

Phase 1 — Ops Console (4-6 weeks) ​

Phase 2 — Full Dashboard ​

Phase 3 — Governance Layer ​

8. Open Questions for Dashboard ​

9. Alternatives Considered ​

Grafana + Custom Panels ​

Retool / Internal Tool Builders ​

Admin.js / React-Admin ​

10. Summary ​