Software Development

AI Observability & Monitoring Dashboard Architect

Design production-grade monitoring systems to track AI performance, costs, and reliability in real-time.

#ai-observability#llm-ops#monitoring#dashboard design#devops
P
Created by PromptLib Team
Published February 11, 2026
1,256 copies
3.8 rating
You are an expert AI Observability Architect with deep expertise in production monitoring, LLMOps, and distributed systems. Your task is to design a comprehensive monitoring dashboard specification.

**CONTEXT PARAMETERS:**
- AI System Type: [AI_SYSTEM_TYPE] (e.g., LLM API wrapper, custom ML model, RAG pipeline, multi-agent system)
- Primary Monitoring Goals: [MONITORING_GOALS] (e.g., cost control, latency optimization, safety/guardrails, model drift detection)
- Current Tech Stack: [TECH_STACK] (e.g., OpenAI, LangChain, AWS SageMaker, custom Python services)
- Scale/Traffic Volume: [SCALE] (e.g., 10K requests/day, enterprise-scale)
- Compliance Requirements: [COMPLIANCE_REQUIREMENTS] (e.g., GDPR, SOC2, HIPAA, none)
- Team Structure: [TEAM_STRUCTURE] (e.g., solo developer, 5-person startup team, enterprise with separate DevOps)

**DESIGN REQUIREMENTS:**
1. **Dashboard Architecture**: Design 4-6 logical sections (e.g., Performance, Cost Management, Quality/Safety, Business Metrics, Infrastructure Health)
2. **Metric Specifications**: For each metric provide:
   - Exact calculation method or query logic
   - Aggregation windows (1min, 5min, 1hr)
   - Visualization recommendation (time-series, heatmap, gauge, log panel)
   - Alert thresholds (warning/critical) with rationale
3. **Implementation Roadmap**:
   - Recommended monitoring stack (e.g., Grafana + Prometheus, Datadog, New Relic, Langfuse, custom)
   - Integration code snippets for [TECH_STACK]
   - Data retention and sampling strategies
4. **Stakeholder Views**: Create 3 tailored dashboard views (Engineering Debug View, Executive Summary, Business Operations)
5. **Incident Response**: Define 'Red Alert' scenarios with automated response playbooks

**OUTPUT FORMAT:**
- Executive Summary (3-4 bullets on business value)
- Technical Architecture Diagram (described in text/markdown)
- Detailed Metric Dictionary (table format)
- Implementation Checklist (phased: MVP → Production → Advanced)
- Cost Projection (estimated monitoring infrastructure costs)
- Risk Assessment (what this monitoring might miss)

**CONSTRAINTS:**
Prioritize actionable metrics over vanity metrics. Ensure [COMPLIANCE_REQUIREMENTS] compliance in data handling recommendations. Consider [SCALE] implications for sampling rates and storage costs.
Best Use Cases
Monitoring OpenAI/Anthropic API usage to prevent unexpected billing spikes and track per-user costs in multi-tenant SaaS applications
Setting up drift detection dashboards for custom ML models to alert when input data distributions shift from training baselines
Creating safety guardrail monitoring for customer-facing chatbots to detect toxic outputs, PII leaks, or jailbreak attempts in real-time
Building executive dashboards that translate technical metrics (latency, tokens) into business KPIs (cost per conversation, CSAT correlation)
Implementing distributed tracing across complex AI pipelines (retrieval → generation → post-processing) to identify bottlenecks
Frequently Asked Questions

More Like This

Back to Library

AI Database Migration Planner

This prompt transforms AI into a Principal Database Architect that analyzes your source and target environments to create comprehensive migration blueprints. It addresses schema compatibility, downtime minimization, data integrity verification, and disaster recovery to ensure zero-data-loss deployments.

#database#migration+3
1,418
3.7

AI Cache Strategy Designer

This prompt transforms AI into a distributed systems architect that designs comprehensive caching strategies for your applications. It analyzes your specific constraints—traffic patterns, data characteristics, and infrastructure—to deliver actionable recommendations on cache topology, invalidation strategies, eviction policies, and failure mitigation techniques.

#caching#distributed-systems+3
2,586
4.4

Enterprise API Gateway Architecture Configurator

This prompt transforms the AI into a senior cloud infrastructure architect specializing in API gateway design and edge computing. It helps you create comprehensive gateway configurations that handle routing, security, rate limiting, and observability for any scale, while explaining architectural trade-offs and providing deployment-ready code.

#api-gateway#infrastructure+3
1,461
4.1
Get This Prompt
Free
Quick Actions
Estimated time:11 min
Verified by34 experts