Software Development

AI Performance Test Script Generator

Generate production-ready load testing and benchmarking scripts tailored for AI endpoints, LLM APIs, and inference services.

#performance-testing#load-testing#llm-benchmarking#api-testing#infrastructure

Created by PromptLib Team

Published February 11, 2026

3,281 copies

4.5 rating

You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications:

**Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint)
**Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark)
**Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust)
**Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns)
**Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour)
**Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs)
**Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4)
**Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests)
**Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput)
**Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report)

**Script Requirements:**
1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1
2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates
3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing
4. Include warmup period and cooldown logic to exclude cold-start anomalies
5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff
6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available
7. Add correlation IDs for distributed tracing
8. Include environment variable configuration for secrets (never hardcode keys)
9. Generate a summary report with statistical significance tests (compare against baseline if provided)
10. Add comments explaining calibration steps and interpretation of results

**Deliverables:**
- Main test script file
- Configuration file (YAML/JSON) for test parameters
- Requirements.txt or package.json dependencies
- README with execution instructions and result interpretation guide
- Sample output showing a test run result

You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications:

**Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint)
**Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark)
**Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust)
**Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns)
**Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour)
**Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs)
**Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4)
**Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests)
**Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput)
**Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report)

**Script Requirements:**
1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1
2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates
3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing
4. Include warmup period and cooldown logic to exclude cold-start anomalies
5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff
6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available
7. Add correlation IDs for distributed tracing
8. Include environment variable configuration for secrets (never hardcode keys)
9. Generate a summary report with statistical significance tests (compare against baseline if provided)
10. Add comments explaining calibration steps and interpretation of results

**Deliverables:**
- Main test script file
- Configuration file (YAML/JSON) for test parameters
- Requirements.txt or package.json dependencies
- README with execution instructions and result interpretation guide
- Sample output showing a test run result

Best Use Cases

Benchmarking LLM API providers (OpenAI, Anthropic, Azure) to compare latency and throughput before production deployment

Stress-testing self-hosted models (Llama, Mistral) to determine optimal batch sizes and concurrent user limits for GPU allocation

Regression testing model updates to ensure new versions meet SLA requirements for response time under identical load conditions

Validating autoscaling policies for AI inference endpoints by simulating traffic spikes and measuring cold-start recovery times

Cost optimization analysis by measuring tokens-per-second efficiency across different model sizes and quantization levels

Frequently Asked Questions

More Like This

Back to Library

Software Development

AI Database Migration Planner

This prompt transforms AI into a Principal Database Architect that analyzes your source and target environments to create comprehensive migration blueprints. It addresses schema compatibility, downtime minimization, data integrity verification, and disaster recovery to ensure zero-data-loss deployments.

#database#migration+3

1,418

3.7

Software Development

AI Cache Strategy Designer

This prompt transforms AI into a distributed systems architect that designs comprehensive caching strategies for your applications. It analyzes your specific constraints—traffic patterns, data characteristics, and infrastructure—to deliver actionable recommendations on cache topology, invalidation strategies, eviction policies, and failure mitigation techniques.

#caching#distributed-systems+3

2,586

4.4

Software Development

Enterprise API Gateway Architecture Configurator

This prompt transforms the AI into a senior cloud infrastructure architect specializing in API gateway design and edge computing. It helps you create comprehensive gateway configurations that handle routing, security, rate limiting, and observability for any scale, while explaining architectural trade-offs and providing deployment-ready code.

#api-gateway#infrastructure+3

1,461

4.1

Get This Prompt

Free

Quick Actions

Open in Playground

USE ON CHATGPT

USE ON CLAUDE

USE ON PERPLEXITY

Estimated time:11 min

Verified by88 experts

Software Development

AI Performance Test Script Generator

Generate production-ready load testing and benchmarking scripts tailored for AI endpoints, LLM APIs, and inference services.

#performance-testing#load-testing#llm-benchmarking#api-testing#infrastructure

Created by PromptLib Team

Published February 11, 2026

3,281 copies

4.5 rating

You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications:

**Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint)
**Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark)
**Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust)
**Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns)
**Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour)
**Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs)
**Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4)
**Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests)
**Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput)
**Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report)

**Script Requirements:**
1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1
2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates
3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing
4. Include warmup period and cooldown logic to exclude cold-start anomalies
5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff
6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available
7. Add correlation IDs for distributed tracing
8. Include environment variable configuration for secrets (never hardcode keys)
9. Generate a summary report with statistical significance tests (compare against baseline if provided)
10. Add comments explaining calibration steps and interpretation of results

**Deliverables:**
- Main test script file
- Configuration file (YAML/JSON) for test parameters
- Requirements.txt or package.json dependencies
- README with execution instructions and result interpretation guide
- Sample output showing a test run result

You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications:

**Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint)
**Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark)
**Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust)
**Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns)
**Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour)
**Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs)
**Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4)
**Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests)
**Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput)
**Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report)

**Script Requirements:**
1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1
2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates
3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing
4. Include warmup period and cooldown logic to exclude cold-start anomalies
5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff
6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available
7. Add correlation IDs for distributed tracing
8. Include environment variable configuration for secrets (never hardcode keys)
9. Generate a summary report with statistical significance tests (compare against baseline if provided)
10. Add comments explaining calibration steps and interpretation of results

**Deliverables:**
- Main test script file
- Configuration file (YAML/JSON) for test parameters
- Requirements.txt or package.json dependencies
- README with execution instructions and result interpretation guide
- Sample output showing a test run result

Best Use Cases

Benchmarking LLM API providers (OpenAI, Anthropic, Azure) to compare latency and throughput before production deployment

Stress-testing self-hosted models (Llama, Mistral) to determine optimal batch sizes and concurrent user limits for GPU allocation

Regression testing model updates to ensure new versions meet SLA requirements for response time under identical load conditions

Validating autoscaling policies for AI inference endpoints by simulating traffic spikes and measuring cold-start recovery times

Cost optimization analysis by measuring tokens-per-second efficiency across different model sizes and quantization levels

Frequently Asked Questions

AI Performance Test Script Generator

Generate production-ready load testing and benchmarking scripts tailored for AI endpoints, LLM APIs, and inference services.

#performance-testing#load-testing#llm-benchmarking#api-testing#infrastructure

Created by PromptLib Team

Published February 11, 2026

3,281 copies

4.5 rating

You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications:

**Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint)
**Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark)
**Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust)
**Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns)
**Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour)
**Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs)
**Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4)
**Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests)
**Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput)
**Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report)

**Script Requirements:**
1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1
2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates
3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing
4. Include warmup period and cooldown logic to exclude cold-start anomalies
5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff
6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available
7. Add correlation IDs for distributed tracing
8. Include environment variable configuration for secrets (never hardcode keys)
9. Generate a summary report with statistical significance tests (compare against baseline if provided)
10. Add comments explaining calibration steps and interpretation of results

**Deliverables:**
- Main test script file
- Configuration file (YAML/JSON) for test parameters
- Requirements.txt or package.json dependencies
- README with execution instructions and result interpretation guide
- Sample output showing a test run result

You are an expert Performance Engineer specializing in AI/ML infrastructure testing. Create a complete, production-grade performance testing script based on the following specifications:

**Target System:** [TARGET_SYSTEM] (e.g., OpenAI GPT-4, Local Llama.cpp server, HuggingFace Inference Endpoint)
**Test Type:** [TEST_TYPE] (e.g., Load Test, Stress Test, Spike Test, Soak Test, Latency Benchmark)
**Programming Language/Framework:** [LANGUAGE] (e.g., Python+asyncio, k6 JavaScript, Artillery.io, Locust)
**Concurrency Parameters:** [CONCURRENCY] (e.g., 10-1000 virtual users, ramp-up patterns)
**Test Duration:** [DURATION] (e.g., 5 minutes, 1 hour)
**Input Dataset:** [DATASET_DESCRIPTION] (describe prompt complexity, token length distribution, or provide sample inputs)
**Authentication Method:** [AUTH_METHOD] (e.g., Bearer token, API Key headers, AWS SigV4)
**Key Metrics to Capture:** [METRICS] (e.g., TTFT-Time to First Token, TPS-Tokens Per Second, Total Latency P95/P99, Error Rate, Cost per 1K requests)
**Success Criteria:** [THRESHOLDS] (e.g., P95 < 2s, Error rate < 0.1%, Min 50 req/s throughput)
**Output Requirements:** [OUTPUT_FORMAT] (e.g., JSON results file, Grafana dashboard config, CSV export, HTML report)

**Script Requirements:**
1. Include proper connection pooling and keep-alive settings for HTTP/2 or HTTP/1.1
2. Implement realistic request pacing (not just infinite loops) with configurable arrival rates
3. Handle streaming responses (SSE) if applicable, with intermediate chunk timing
4. Include warmup period and cooldown logic to exclude cold-start anomalies
5. Implement circuit breaker pattern for 5xx errors and rate limit handling (429s) with exponential backoff
6. Capture detailed metrics: request latency histograms, token throughput (input/output), error classification, and resource utilization if available
7. Add correlation IDs for distributed tracing
8. Include environment variable configuration for secrets (never hardcode keys)
9. Generate a summary report with statistical significance tests (compare against baseline if provided)
10. Add comments explaining calibration steps and interpretation of results

**Deliverables:**
- Main test script file
- Configuration file (YAML/JSON) for test parameters
- Requirements.txt or package.json dependencies
- README with execution instructions and result interpretation guide
- Sample output showing a test run result

Best Use Cases

Benchmarking LLM API providers (OpenAI, Anthropic, Azure) to compare latency and throughput before production deployment

Stress-testing self-hosted models (Llama, Mistral) to determine optimal batch sizes and concurrent user limits for GPU allocation

Regression testing model updates to ensure new versions meet SLA requirements for response time under identical load conditions

Validating autoscaling policies for AI inference endpoints by simulating traffic spikes and measuring cold-start recovery times

Cost optimization analysis by measuring tokens-per-second efficiency across different model sizes and quantization levels

Frequently Asked Questions

AI Performance Test Script Generator

Should I use this for testing local models or cloud APIs?

How do I handle rate limiting (429 errors) during the test?

Can this generate k6 scripts instead of Python?

More Like This

AI Database Migration Planner

AI Cache Strategy Designer

Enterprise API Gateway Architecture Configurator

AI Performance Test Script Generator

Should I use this for testing local models or cloud APIs?

How do I handle rate limiting (429 errors) during the test?

Can this generate k6 scripts instead of Python?

More Like This

AI Database Migration Planner

AI Cache Strategy Designer

Enterprise API Gateway Architecture Configurator

AI Performance Test Script Generator

Should I use this for testing local models or cloud APIs?

How do I handle rate limiting (429 errors) during the test?

Can this generate k6 scripts instead of Python?

More Like This

AI Database Migration Planner

AI Cache Strategy Designer

Enterprise API Gateway Architecture Configurator