The Complete Guide to Testing ChatGPT Apps and MCP Apps
The sunpeak ChatGPT App simulator with testing capabilities.
[Updated 2026-02-05] Testing ChatGPT Apps presents unique challenges. Your UI runs inside an AI host’s runtime, responds to tool invocations, and adapts to multiple display modes and themes. Without proper testing infrastructure, you’re deploying blind.
TL;DR: Use sunpeak’s built-in testing with Vitest for unit tests (pnpm test) and Playwright for e2e tests (pnpm test:e2e). Define states in simulation files, test across display modes with createSimulatorUrl, and run everything in CI.
This guide covers everything you need to test ChatGPT Apps with confidence.
Why Testing ChatGPT Apps is Different
ChatGPT Apps run in a specialized runtime environment. Your React components don’t just render in a browser—they render inside the ChatGPT App runtime with:
- Host frontend state - Inline, in picture-in-picture, and fullscreen display modes, light or dark theme, etc.
- Tool invocations - The AI host calls your app’s tools with specific inputs
- Backend state - Various possible states for users and sessions in your database
- App state - Persistent state that survives across invocations
Testing each combination manually isn’t feasible, the combinatorics are brutal. You need automated testing that covers all these scenarios.
Setting Up Your Testing Environment
If you’re using the sunpeak ChatGPT App framework, testing is pre-configured. Start with:
pnpm add -g sunpeak && sunpeak new
cd my-app
Your project includes:
- Vitest configured with jsdom, React Testing Library, and jest-dom matchers
- Playwright configured to test against the ChatGPT App simulator
- Simulation files in
tests/simulations/for deterministic states
Unit Testing with Vitest
Unit tests validate individual components in isolation. Run them with:
pnpm test
Create tests alongside your components in src/resources with the .test.tsx extension:
import { render, screen } from '@testing-library/react';
import { Counter } from '../src/resources/counter-resource';
describe('Counter', () => {
it('renders the initial count', () => {
render(<Counter />);
expect(screen.getByText('0')).toBeInTheDocument();
});
it('increments when button is clicked', async () => {
render(<Counter />);
await userEvent.click(screen.getByRole('button', { name: /increment/i }));
expect(screen.getByText('1')).toBeInTheDocument();
});
});
Unit tests run fast and catch component-level bugs early. They’re ideal for testing:
- Component rendering logic
- User interactions within a component
- Props and state handling
End-to-End Testing with Playwright
E2E tests validate your ChatGPT App running in the simulator. Run them with:
pnpm test:e2e
Create tests in tests/e2e/ with the .spec.ts extension:
import { test, expect } from '@playwright/test';
import { createSimulatorUrl } from 'sunpeak';
test('counter increments in fullscreen mode', async ({ page }) => {
await page.goto(createSimulatorUrl({
simulation: 'counter-show',
displayMode: 'fullscreen',
theme: 'dark',
}));
await page.getByRole('button', { name: /increment/i }).click();
await expect(page.getByText('1')).toBeVisible();
});
The createSimulatorUrl utility generates URLs with your test configuration:
simulation- Your simulation file name (sets initial state)displayMode-inline,pip, orfullscreen(tests display adaptation)theme-lightordark(tests theme handling)deviceType-mobile,tablet,desktop, orunknown(tests responsive behavior)touch/hover- Enable or disable touch/hover capabilitiessafeAreaTop,safeAreaBottom, etc. - Simulate device notches and insets
Creating Simulation Files
Simulation files define deterministic states for testing. Create them in tests/simulations/{resource-name}/:
{
"userMessage": "Show me a counter starting at 5",
"tool": {
"name": "show_counter",
"description": "Displays an interactive counter",
"inputSchema": {
"type": "object",
"properties": {
"initialCount": { "type": "number" }
}
}
},
"toolInput": {
"arguments": { "initialCount": 5 }
},
"toolResult": {
"content": [{ "type": "text", "text": "Counter displayed" }],
"structuredContent": {
"count": 5
}
}
}
This simulation:
- Shows
userMessagein the simulator chat interface - Defines the
toolwith its name and input schema - Sets
toolInputwith mock input accessible viauseToolData() - Provides
toolResultwith mock output data passed to your component viauseToolData()
Use simulations to test specific states without manual setup:
// Test the counter with toolResult.structuredContent.count = 5
await page.goto(createSimulatorUrl({ simulation: 'counter-show' }));
await expect(page.getByText('5')).toBeVisible();
// Test a different initial state
await page.goto(createSimulatorUrl({ simulation: 'counter-initial' }));
await expect(page.getByText('0')).toBeVisible();
Testing Across Display Modes
ChatGPT Apps appear in three display modes. Test all of them:
const displayModes = ['inline', 'pip', 'fullscreen'] as const;
for (const displayMode of displayModes) {
test(`renders correctly in ${displayMode} mode`, async ({ page }) => {
await page.goto(createSimulatorUrl({
simulation: 'counter-show',
displayMode,
}));
await expect(page.getByRole('button')).toBeVisible();
});
}
Each mode has different constraints:
- Inline - Limited height, embedded in chat
- Picture-in-picture - Floating window, can be repositioned
- Fullscreen - Maximum space, modal overlay
Your app should adapt gracefully to each.
Testing Theme Adaptation
Test both light and dark themes:
test('adapts to dark theme', async ({ page }) => {
await page.goto(createSimulatorUrl({
simulation: 'counter-show',
theme: 'dark',
}));
// Verify dark theme styles are applied
const button = page.getByRole('button');
await expect(button).toHaveCSS('background-color', 'rgb(255, 184, 0)');
});
Running Tests in CI/CD
Add testing to your GitHub Actions workflow:
name: Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v2
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'pnpm'
- run: pnpm install
- run: pnpm test
- run: pnpm exec playwright install --with-deps
- run: pnpm test:e2e
Playwright tests automatically:
- Start the sunpeak dev server
- Wait for it to be ready
- Run tests against the ChatGPT App simulator
- Shut down when complete
Debugging Failing Tests
When tests fail, use these debugging techniques:
Playwright Debug Mode
pnpm test:e2e --ui
Opens a visual debugger where you can:
- Step through tests
- Inspect the DOM at each step
- See screenshots and traces
Vitest Verbose Output
pnpm test --reporter=verbose
Shows detailed output including:
- Individual assertion results
- Component render output
- Error stack traces
Screenshot on Failure
Playwright automatically captures screenshots on failure. Find them in test-results/.
Testing Best Practices
One assertion per test. Keep tests focused and easy to debug:
// Good: focused test
test('increment button is visible', async ({ page }) => {
await page.goto(createSimulatorUrl({ simulation: 'counter-show' }));
await expect(page.getByRole('button', { name: /increment/i })).toBeVisible();
});
// Avoid: multiple unrelated assertions
test('counter works', async ({ page }) => {
// Too many things being tested at once
});
Test behavior, not implementation. Focus on what users see:
// Good: tests user-visible behavior
await expect(page.getByText('5')).toBeVisible();
// Avoid: tests implementation details
await expect(component.state.count).toBe(5);
Use descriptive test names. Make failures self-explanatory:
// Good: clear failure message
test('displays error message when API call fails', ...)
// Avoid: vague description
test('handles error', ...)
Clean up between tests. Reset state to avoid test pollution:
afterEach(async () => {
// Reset any global state
});
Next Steps
Testing is essential for shipping reliable ChatGPT Apps. With sunpeak’s ChatGPT App testing infrastructure, you can:
- Run unit tests with Vitest for fast feedback
- Run e2e tests with Playwright for full integration coverage
- Test across display modes, themes, and device types
- Integrate testing into your CI/CD pipeline
Get started with sunpeak:
pnpm add -g sunpeak && sunpeak new
- Learn about the ChatGPT App framework
- Read the testing documentation
- Try the interactive simulator
- Star sunpeak on GitHub
Frequently Asked Questions
How do I test a ChatGPT App locally without a paid ChatGPT account?
Use sunpeak, the ChatGPT App framework. Run "sunpeak dev" to start a local ChatGPT App simulator at localhost:3000. You can test all display modes, themes, and tool invocations without any ChatGPT subscription.
What testing frameworks work with ChatGPT Apps?
Sunpeak includes pre-configured support for Vitest (unit testing) and Playwright (end-to-end testing). Run "pnpm test" for unit tests and "pnpm test:e2e" for end-to-end tests. Both frameworks integrate with the sunpeak ChatGPT App simulator for deterministic UI testing.
How do I run ChatGPT App tests in CI/CD pipelines?
Sunpeak projects include testing infrastructure ready for CI/CD. Add "pnpm test" and "pnpm test:e2e" to your pipeline. Playwright tests automatically start the dev server and run against the sunpeak ChatGPT App simulator in headless mode.
What are simulation files in ChatGPT App testing?
Simulation files are JSON files in tests/simulations/{resource}/ that define deterministic UI states for testing. They specify a tool definition, toolInput (mock input), toolResult (mock output), and a userMessage. The sunpeak framework auto-discovers files matching *-simulation.json.
Can I test different ChatGPT App display modes with sunpeak?
Yes. Use the createSimulatorUrl utility to test inline, picture-in-picture, and fullscreen display modes. Pass displayMode as a parameter along with theme (light/dark) and device type to validate your ChatGPT App (built as an MCP App) across all configurations.
How do I test ChatGPT Apps and MCP Apps together?
ChatGPT Apps and MCP Apps share the same testing infrastructure in sunpeak. ChatGPT Apps are built as MCP Apps — write your tests once with sunpeak and they work across all MCP-compatible AI hosts.
What is the difference between unit tests and e2e tests for ChatGPT Apps?
Unit tests (Vitest) test individual React components in isolation using jsdom. E2E tests (Playwright) test the full ChatGPT App running in the sunpeak simulator, including user interactions, tool calls, and display mode transitions.
How do I debug failing ChatGPT App tests?
Run "pnpm test:e2e --ui" to open Playwright in debug mode with a visual interface. You can step through tests, inspect the DOM, and see screenshots at each step. For unit tests, use "pnpm test --reporter=verbose" for detailed output. Sunpeak's testing infrastructure makes debugging straightforward.