Skip to content

Kaze Dashboard — UI Research & Proposal

Internal ops dashboard for Speedrun team to monitor, supervise, and configure the Kaze agent platform. Not a client-facing UI — SME clients interact via Slack/WhatsApp/Email (D15).


1. Users & Access Model

RoleAccessPrimary Use
Platform AdminFull accessSystem config, agent definitions, credential status, deployment
Ops LeadAll verticalsSupervision queue, quality monitoring, budget oversight
Vertical OperatorScoped to vertical(s)Task monitoring, supervision review, knowledge curation

Access control follows existing ABAC model. Dashboard authenticates via the same identity provider used for Speedrun internal tools.


2. Information Architecture

Dashboard
├── Overview (home)
├── Agents
│   ├── Fleet Status
│   └── Agent Detail → tasks, skills, supervision state, logs
├── Supervision
│   ├── Review Queue
│   └── Ramp Management
├── Tasks
│   ├── Active / Recent
│   └── Task Detail → execution trace, tool calls, LLM interactions
├── Knowledge
│   ├── Shared Tier (quarantine, verified)
│   ├── Per-Agent Memory
│   └── Ingestion Status
├── Budget & Cost
│   ├── Token Usage
│   ├── Per-Tenant Breakdown
│   └── Alerts
├── Logs & Traces
│   ├── Observation Explorer (18 event types)
│   └── Langfuse Deep Link
└── Settings
    ├── Vertical Definitions
    ├── Skill Config
    ├── Tool Integrations
    └── Model Routing (D35 hints)

3. Key Views

3.1 Overview (Home)

Single-screen operational pulse. Answers: "Is everything healthy right now?"

Layout: Top metric cards + two-column body (left: activity feed, right: attention items).

Metric cards (top row):

  • Active agents (by state: idle / executing / waiting / error)
  • Tasks last 24h (completed / failed / escalated)
  • Supervision queue depth (items awaiting review)
  • Token spend today vs. budget cap
  • Mean task latency (p50, p95)

Activity feed (left):

  • Real-time stream of significant events (task completions, escalations, errors, supervision decisions, budget warnings).
  • Filterable by vertical, severity, event type.
  • Each entry links to its detail view.

Attention items (right):

  • Tasks stuck or failed (> threshold duration or retry count)
  • Supervision items waiting > SLA
  • Budget utilization > 80%
  • Agent errors or unhealthy states
  • Knowledge quarantine items pending review

3.2 Agent Fleet

List view: Table of running VerticalAgent instances.

ColumnSource
Verticalvertical.yaml id
TenanttenantId
Stateidle / executing / waiting / error
Current TasktaskId or "—"
Skillscount, with supervision level badges
Uptimesince spawn
Tasks (24h)completed / failed

Agent detail view:

  • Identity: vertical, tenant, spawn time, model hint
  • Skill cards: each skill shows supervision level (supervised/sampling/autonomous), success rate, promotion threshold progress bar
  • Active task: real-time execution trace (LLM calls, tool calls, knowledge queries)
  • Recent tasks: paginated history with status, duration, token cost
  • Memory: per-agent episodic memory entries (from Mem0), searchable

3.3 Supervision Queue (addresses Q6)

The critical operational view. This is where human-in-the-loop happens.

Queue list:

  • Sorted by: age (oldest first), priority, vertical
  • Each item shows: task summary, agent, skill, vertical, client, timestamp, risk level
  • Batch actions: approve all (filtered), reject selected

Review panel (split-pane or slide-over):

  • Left: Agent's proposed output (markdown rendered)
  • Right: Context panel
    • Task input (what was requested)
    • Execution trace (collapsed, expandable)
    • Knowledge sources cited
    • Similar past approvals/rejections
  • Action bar:
    • Approve (deliver to client)
    • Approve with edit (inline edit before delivery)
    • Reject with feedback (text input — feeds back into skill improvement)
    • Escalate (assign to another team member)
  • Keyboard shortcuts: a approve, e edit, r reject, j/k next/prev — ops will use this heavily

Supervision ramp management:

  • Per-skill promotion/demotion controls
  • Current level + threshold progress
  • Manual override (promote/demote with reason, logged)
  • Promotion criteria: configurable success rate % over N recent tasks

3.4 Task Explorer

List view: Filterable by vertical, agent, status, date range, skill.

Task detail:

  • Status timeline: dispatched → executing → (escalated?) → completed/failed
  • Full execution trace (observation events):
    • LLM calls: prompt (truncated), response, model, tokens, latency, cost
    • Tool calls: tool name, args, result, duration
    • Knowledge operations: search queries, results, commits
    • Supervision events: review decision, reviewer, feedback
  • Token breakdown: per-step cost, cumulative
  • Deep link to Langfuse trace (D48)

3.5 Knowledge Management

Shared knowledge tier:

  • Entries in quarantine (D40): source, content preview, quality signals (LLM-as-judge score, cross-reference count, source verification status)
  • Verify / reject actions
  • Verified entries: searchable, filterable by domain, provenance class (D43)

Ingestion status:

  • Seed skill runs: repos processed, files ingested, errors
  • Last ingestion time per repo
  • Trigger manual re-ingestion

Per-agent memory:

  • Browse episodic memory entries per agent
  • Search across all agent memories
  • Delete/archive stale entries

3.6 Budget & Cost

Token usage dashboard:

  • Time-series chart: daily token usage by provider/model
  • Per-tenant breakdown: table with usage, cost, budget cap, utilization %
  • Per-vertical breakdown: which verticals consume most
  • Per-skill breakdown: which skills are most expensive
  • Model selection distribution: pie chart of fast/balanced/best usage

Alerts:

  • Budget threshold alerts (configurable: 50%, 80%, 90%, 100%)
  • Anomaly detection: unusual spike in usage for a tenant/agent
  • Cost per task trending up (possible prompt regression)

3.7 Logs & Traces

Observation explorer:

  • Full-text search across all 18 observation event types
  • Filter by: event type, vertical, agent, tenant, time range, severity
  • Structured view with expandable JSON payloads
  • Link to related task detail

Langfuse integration:

  • Embed or deep-link to Langfuse trace views
  • Use Langfuse for prompt management and A/B testing visualization

3.8 Settings

Vertical & skill config:

  • View loaded vertical definitions (from YAML, read-only display)
  • Skill catalog: all skills across verticals, their schemas, supervision defaults
  • Tool integration status: which tools are registered in gateway, credential health (valid/expired/missing — no secrets shown)

Model routing (D35):

  • Current model mapping per hint level (fast/balanced/best)
  • Per-tenant overrides
  • Provider fallback chain status

4. Tech Stack Recommendation

LayerChoiceRationale
FrameworkNext.js 15 (App Router)SSR for initial load, RSC for data-heavy views, API routes for BFF
UI Libraryshadcn/ui + TailwindComposable, accessible, matches Kaze brand (dark theme, Inter/JetBrains Mono)
StateTanStack QueryServer state caching, real-time refetch, optimistic updates
Real-timeSSE from runtime/gatewayLightweight, no WebSocket infra needed for MVP
ChartsRecharts or TremorToken usage, cost trends, quality metrics
TablesTanStack TableSortable, filterable, paginated — core of fleet/task/queue views
AuthNextAuth.js or LuciaSession-based, supports OAuth (GitHub, Google)
DeployContainer (same K8s cluster)Co-located with runtime/gateway, internal network only

Why Next.js over alternatives:

  • TypeScript-native (consistent with runtime/gateway)
  • RSC reduces client bundle for data-heavy dashboard
  • API routes serve as BFF — aggregates calls to runtime (4100), gateway (4200), knowledge (4300)
  • Vercel-optional: runs as standalone Node.js container

Option B: Vite + React SPA

Simpler setup, faster dev iteration. No SSR benefits. Would need a separate BFF or hit services directly. Better if the dashboard stays very simple and internal-only.

Option C: Vue 3 + Nuxt

Viable if team prefers Vue. Similar capabilities to Next.js. Less ecosystem momentum for dashboard component libraries.

Recommendation: Option A for the full dashboard vision. Option B if we want to ship a minimal ops console fast and iterate.


5. Data Flow

The dashboard does not have its own database. It reads from existing services:

┌─────────────────────┐
│   Kaze Dashboard    │
│   (Next.js BFF)     │
└──┬──────┬───────┬───┘
   │      │       │
   ▼      ▼       ▼
Runtime  Gateway  Knowledge
:4100    :4200    :4300

New API surface needed on existing services:

ServiceNew EndpointsPurpose
RuntimeGET /agents (exists)Fleet status
RuntimeGET /tasks (new)Task history, paginated
RuntimeGET /tasks/:id/trace (new)Full observation trace for a task
RuntimeGET /supervision/queue (new)Pending review items
RuntimePOST /supervision/review (new)Submit review decision
RuntimeGET /supervision/ramp (new)Current supervision levels
RuntimePOST /supervision/ramp (new)Manual promote/demote
GatewayGET /usage (new)Token usage aggregates
GatewayGET /tools/catalog (exists)Tool health
KnowledgeGET /entries (new)Browse knowledge entries
KnowledgeGET /quarantine (new)Quarantine queue
KnowledgePOST /quarantine/:id/verify (new)Approve quarantine entry

Real-time updates:

  • Runtime exposes SSE endpoint GET /events/stream — pushes observation events as they fire
  • Dashboard subscribes on connect, filters client-side by vertical/agent
  • Reconnect with Last-Event-ID for gap recovery

6. Design System Alignment

Per design/brand.md:

ElementSpec
BackgroundCharcoal #171717 (dark-first)
Primary accentPale Sage #A7F3D0
Secondary accentMint #6EE7B7
Text primary#F5F5F5
Text secondary#A3A3A3
HeadingsBold sans-serif (system or custom)
Body textInter
Code / dataJetBrains Mono
Borders#262626 subtle
Cards#1C1C1C with #262626 border
Status: healthy#A7F3D0 (sage)
Status: warning#FDE68A (amber)
Status: error#FCA5A5 (rose)
Status: idle#A3A3A3 (neutral)

Visual tone: Clean, data-dense, minimal chrome. "Zen-Infrastructure" — precision meets calm. No gratuitous animations. Information density over decoration.


7. MVP Scope (Phase 1 Dashboard)

Ship a useful ops tool fast, expand later.

Phase 1 — Ops Console (4-6 weeks)

Build with Option A (Next.js) or Option B (Vite SPA) depending on speed priority.

In scope:

  • Overview page (metric cards + activity feed)
  • Agent fleet status (list + basic detail)
  • Task explorer (list + trace detail, Langfuse deep link)
  • Supervision queue (list + review panel with approve/reject)
  • Basic token usage view (per-tenant, per-day)

Out of scope (Phase 2+):

  • Knowledge management UI (use direct API / Langfuse for now)
  • Budget alerts and anomaly detection
  • Settings / config management (keep using YAML files)
  • Supervision ramp management UI (use API directly)
  • Real-time SSE (use polling with TanStack Query refetch intervals)
  • Advanced ABAC (single admin role for MVP)

Phase 2 — Full Dashboard

  • Knowledge quarantine review
  • Supervision ramp management
  • Budget alerts and anomaly detection
  • Real-time SSE streaming
  • Role-based access (admin, ops lead, operator)
  • Settings UI for vertical/skill/model config
  • Multi-channel conversation context view (Q5)

Phase 3 — Governance Layer

  • Layer 3 agent dashboards (Health Monitor, Cost Monitor, Quality Monitor)
  • Self-improvement loop visualization
  • Cross-vertical analytics
  • Client portal (if/when needed)

8. Open Questions for Dashboard

#QuestionImpactNotes
DQ1Where does the dashboard live as a repo? New repo (kaze-dashboard) or inside kaze-runtime?ArchitectureRecommend: separate repo, keeps runtime lean
DQ2Auth provider? GitHub OAuth (team already uses GH) vs. Google OAuth vs. custom?SecurityGitHub OAuth is simplest for internal tool
DQ3Do we need offline/degraded mode if runtime is down?ReliabilityProbably not for internal tool — show error state
DQ4Should observation events be persisted in a queryable store (Postgres) or only in Langfuse?Data archLangfuse for MVP; own Postgres table for Phase 2 if query patterns diverge
DQ5Mobile responsiveness needed?DesignProbably not — ops team uses desktop. Tablet-friendly is nice-to-have
DQ6Should the BFF aggregate cross-service data or let the client call services directly?ArchitectureBFF preferred — single auth boundary, aggregation, caching

9. Alternatives Considered

Grafana + Custom Panels

  • Pros: Fast setup, good for metrics/logs, already used in many ops teams
  • Cons: Poor fit for supervision queue (interactive workflow), limited custom UI, separate auth, not branded
  • Verdict: Use Langfuse for trace visualization, build custom for interactive workflows

Retool / Internal Tool Builders

  • Pros: Very fast to prototype, drag-and-drop
  • Cons: Vendor lock-in, limited customization, recurring cost, can't embed in K8s easily, not branded
  • Verdict: Skip — we need custom supervision UX and want to own the experience

Admin.js / React-Admin

  • Pros: Pre-built CRUD patterns, fast for data management
  • Cons: Designed for database-backed CRUD, awkward fit for service-aggregated data, limited real-time
  • Verdict: Could use for Settings section only, but not worth the dependency for one page

10. Summary

The Kaze Dashboard is an internal ops console focused on three core workflows:

  1. Monitor — Are agents healthy? Are tasks succeeding? Is spend on track?
  2. Supervise — Review agent outputs, approve/reject/edit, manage trust ramp.
  3. Investigate — When something goes wrong, trace the full execution path.

It reads from existing services (runtime, gateway, knowledge) via a BFF layer, requires minimal new API surface, and aligns with the Kaze brand system. MVP ships the monitoring and supervision workflows; knowledge management and governance views come later.