Question 1

Do I need a sunpeak project to use the testing framework?

Accepted Answer

No. Run "npx sunpeak test init" in any JavaScript, TypeScript, Python, Go, or Rust project. It scaffolds Playwright config and a starter test file. For non-JS projects, it creates a self-contained tests/sunpeak/ directory with everything included.

Question 2

What test runners does sunpeak use?

Accepted Answer

E2E tests use Playwright against the sunpeak inspector (replicated ChatGPT and Claude runtimes). Live tests use Playwright against real ChatGPT. You write standard Playwright assertions plus MCP-specific matchers like toHaveTextContent and toHaveStructuredContent.

Question 3

How do simulation files work?

Accepted Answer

Simulation files are JSON fixtures in tests/simulations/ that define a tool call scenario: tool input, tool result, and optional server tool mocks. The inspector loads them to render your MCP App in a specific state. Each simulation is a reproducible test scenario you can assert against.

Question 4

Can I test across ChatGPT and Claude automatically?

Accepted Answer

Yes. The sunpeak test runner uses Playwright projects to run each test against both ChatGPT and Claude host runtimes automatically. One test file, both hosts. Configure which hosts to test in defineConfig().

Question 5

What is visual regression testing?

Accepted Answer

Run "pnpm test:visual" to capture screenshots of your MCP App and compare them against saved baselines. If the UI changes unexpectedly, the test fails with a diff image. Run "pnpm test:visual --update" to update baselines after intentional changes.

Question 6

How do live tests differ from E2E tests?

Accepted Answer

E2E tests run against the local inspector with simulation fixtures. They are fast, deterministic, and free. Live tests run against real ChatGPT using Playwright. sunpeak handles auth, message sending, and iframe access. You only write assertions against the rendered app.

Question 7

Does sunpeak testing work in CI/CD?

Accepted Answer

Yes. Add "pnpm test" to your CI pipeline. It starts the dev server automatically, runs E2E and visual regression tests, and shuts down when complete. No paid host accounts, API keys, or AI credits needed on CI runners.

Question 8

What are evals in sunpeak?

Accepted Answer

Evals test whether different LLMs call your tools correctly. They connect to your MCP server, discover tools via MCP protocol, send prompts to multiple models (GPT-4o, Claude, Gemini, etc.), and assert that each model calls the right tools with the right arguments. Each eval runs N times per model to measure reliability. Run them with "pnpm test:eval".

Question 9

Is sunpeak testing free?

Accepted Answer

Yes. sunpeak is MIT licensed and open source. The testing framework, inspector, CLI, and all tooling are free to use. Evals require API keys for the LLM providers you want to test against.

Command	What it runs	Runtime
`pnpm test`	E2E tests	Playwright + inspector
`pnpm test:visual`	E2E + visual regression	Playwright + inspector + screenshots
`pnpm test:live`	Live tests against real ChatGPT	Playwright + real host
`pnpm test:eval`	Evals against multiple LLM models	Vitest + Vercel AI SDK
`npx sunpeak test init`	Scaffold test infrastructure	Adds Playwright config, tests, and evals

MCP App Testing Framework

What is the sunpeak testing framework?

E2E Tests

Visual Regression

Multi-Model Evals

Live Host Tests

Manual Testing

Test CLI

How It Works

Scaffold Tests

Define Simulations

Write Tests

Run in CI/CD

What You Can Test

Who It's For

MCP App Developers

MCP Server Authors

Coding Agents

Getting Started

Frequently Asked Questions