AI Performance Test Script Generator

Generate production-ready load testing and benchmarking scripts tailored for AI endpoints, LLM APIs, and inference services.

#performance-testing#load-testing#llm-benchmarking#api-testing#infrastructure
P

Created by PromptLib Team

February 11, 2026

3,281
Total Copies
4.5
Average Rating
You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications: **Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint) **Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark) **Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust) **Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns) **Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour) **Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs) **Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4) **Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests) **Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput) **Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report) **Script Requirements:** 1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1 2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates 3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing 4. Include warmup period and cooldown logic to exclude cold-start anomalies 5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff 6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available 7. Add correlation IDs for distributed tracing 8. Include environment variable configuration for secrets (never hardcode keys) 9. Generate a summary report with statistical significance tests (compare against baseline if provided) 10. Add comments explaining calibration steps and interpretation of results **Deliverables:** - Main test script file - Configuration file (YAML/JSON) for test parameters - Requirements.txt or package.json dependencies - README with execution instructions and result interpretation guide - Sample output showing a test run result

Best Use Cases

Benchmarking LLM API providers (OpenAI, Anthropic, Azure) to compare latency and throughput before production deployment

Stress-testing self-hosted models (Llama, Mistral) to determine optimal batch sizes and concurrent user limits for GPU allocation

Regression testing model updates to ensure new versions meet SLA requirements for response time under identical load conditions

Validating autoscaling policies for AI inference endpoints by simulating traffic spikes and measuring cold-start recovery times

Cost optimization analysis by measuring tokens-per-second efficiency across different model sizes and quantization levels

Frequently Asked Questions

Should I use this for testing local models or cloud APIs?

Both. The script adapts to either scenario. For local models, it can include GPU utilization monitoring (nvidia-smi integration). For cloud APIs, it focuses on network latency, rate limiting behavior, and cost-per-request analysis.

How do I handle rate limiting (429 errors) during the test?

The generated scripts include exponential backoff with jitter and circuit breaker patterns. You can configure the maximum retry attempts and backoff multiplier in the configuration file. The final report will distinguish between application errors and throttling events.

Can this generate k6 scripts instead of Python?

Yes. Specify 'k6 JavaScript' or 'Artillery.io YAML' in the [LANGUAGE] variable. The prompt will adapt the output to use k6's WebSocket handling for streaming responses or HTTP/2 multiplexing as appropriate for the target system.

Get this Prompt

Free
Estimated time: 5 min
Verified by 88 experts

More Like This

AI Database Migration Planner

Generate production-ready database migration strategies with risk assessment, rollback protocols, and step-by-step execution plans.

#database#migration+3
1,418
Total Uses
3.7
Average Rating
View Prompt

AI Cache Strategy Designer

Architect high-performance, scalable caching layers tailored to your specific infrastructure and consistency requirements.

#caching#distributed-systems+3
2,586
Total Uses
4.4
Average Rating
View Prompt

Enterprise API Gateway Architecture Configurator

Generate production-ready, secure, and scalable API gateway configurations with infrastructure-as-code templates and best practices.

#api-gateway#infrastructure+3
1,461
Total Uses
4.1
Average Rating
View Prompt