AI Performance Test Script Generator
Generate production-ready load testing and benchmarking scripts tailored for AI endpoints, LLM APIs, and inference services.
You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications: **Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint) **Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark) **Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust) **Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns) **Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour) **Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs) **Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4) **Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests) **Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput) **Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report) **Script Requirements:** 1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1 2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates 3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing 4. Include warmup period and cooldown logic to exclude cold-start anomalies 5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff 6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available 7. Add correlation IDs for distributed tracing 8. Include environment variable configuration for secrets (never hardcode keys) 9. Generate a summary report with statistical significance tests (compare against baseline if provided) 10. Add comments explaining calibration steps and interpretation of results **Deliverables:** - Main test script file - Configuration file (YAML/JSON) for test parameters - Requirements.txt or package.json dependencies - README with execution instructions and result interpretation guide - Sample output showing a test run result
You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications: **Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint) **Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark) **Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust) **Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns) **Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour) **Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs) **Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4) **Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests) **Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput) **Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report) **Script Requirements:** 1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1 2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates 3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing 4. Include warmup period and cooldown logic to exclude cold-start anomalies 5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff 6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available 7. Add correlation IDs for distributed tracing 8. Include environment variable configuration for secrets (never hardcode keys) 9. Generate a summary report with statistical significance tests (compare against baseline if provided) 10. Add comments explaining calibration steps and interpretation of results **Deliverables:** - Main test script file - Configuration file (YAML/JSON) for test parameters - Requirements.txt or package.json dependencies - README with execution instructions and result interpretation guide - Sample output showing a test run result
More Like This
Back to LibraryAI Database Migration Planner
This prompt transforms AI into a Principal Database Architect that analyzes your source and target environments to create comprehensive migration blueprints. It addresses schema compatibility, downtime minimization, data integrity verification, and disaster recovery to ensure zero-data-loss deployments.
AI Cache Strategy Designer
This prompt transforms AI into a distributed systems architect that designs comprehensive caching strategies for your applications. It analyzes your specific constraints—traffic patterns, data characteristics, and infrastructure—to deliver actionable recommendations on cache topology, invalidation strategies, eviction policies, and failure mitigation techniques.
Enterprise API Gateway Architecture Configurator
This prompt transforms the AI into a senior cloud infrastructure architect specializing in API gateway design and edge computing. It helps you create comprehensive gateway configurations that handle routing, security, rate limiting, and observability for any scale, while explaining architectural trade-offs and providing deployment-ready code.