Software Development

AI Performance Test Script Generator

Generate production-ready load testing and benchmarking scripts tailored for AI endpoints, LLM APIs, and inference services.

#performance-testing#load-testing#llm-benchmarking#api-testing#infrastructure
P
Created by PromptLib Team
Published February 11, 2026
3,281 copies
4.5 rating
You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications:

**Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint)
**Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark)
**Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust)
**Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns)
**Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour)
**Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs)
**Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4)
**Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests)
**Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput)
**Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report)

**Script Requirements:**
1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1
2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates
3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing
4. Include warmup period and cooldown logic to exclude cold-start anomalies
5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff
6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available
7. Add correlation IDs for distributed tracing
8. Include environment variable configuration for secrets (never hardcode keys)
9. Generate a summary report with statistical significance tests (compare against baseline if provided)
10. Add comments explaining calibration steps and interpretation of results

**Deliverables:**
- Main test script file
- Configuration file (YAML/JSON) for test parameters
- Requirements.txt or package.json dependencies
- README with execution instructions and result interpretation guide
- Sample output showing a test run result
Best Use Cases
Benchmarking LLM API providers (OpenAI, Anthropic, Azure) to compare latency and throughput before production deployment
Stress-testing self-hosted models (Llama, Mistral) to determine optimal batch sizes and concurrent user limits for GPU allocation
Regression testing model updates to ensure new versions meet SLA requirements for response time under identical load conditions
Validating autoscaling policies for AI inference endpoints by simulating traffic spikes and measuring cold-start recovery times
Cost optimization analysis by measuring tokens-per-second efficiency across different model sizes and quantization levels
Frequently Asked Questions

More Like This

Back to Library

AI Database Migration Planner

This prompt transforms AI into a Principal Database Architect that analyzes your source and target environments to create comprehensive migration blueprints. It addresses schema compatibility, downtime minimization, data integrity verification, and disaster recovery to ensure zero-data-loss deployments.

#database#migration+3
1,418
3.7

AI Cache Strategy Designer

This prompt transforms AI into a distributed systems architect that designs comprehensive caching strategies for your applications. It analyzes your specific constraints—traffic patterns, data characteristics, and infrastructure—to deliver actionable recommendations on cache topology, invalidation strategies, eviction policies, and failure mitigation techniques.

#caching#distributed-systems+3
2,586
4.4

Enterprise API Gateway Architecture Configurator

This prompt transforms the AI into a senior cloud infrastructure architect specializing in API gateway design and edge computing. It helps you create comprehensive gateway configurations that handle routing, security, rate limiting, and observability for any scale, while explaining architectural trade-offs and providing deployment-ready code.

#api-gateway#infrastructure+3
1,461
4.1
Get This Prompt
Free
Quick Actions
Estimated time:11 min
Verified by88 experts