Integration Testing for MCP Apps, ChatGPT Apps, and Claude Connectors
Integration testing MCP App tool handlers and resource components together.
Unit tests mock everything. E2E tests render everything. Integration tests sit in between: they exercise your real tool handlers through the MCP protocol without spinning up a browser or rendering iframes. For MCP Apps, ChatGPT Apps, and Claude Connectors, this middle layer is where most production bugs live, because it’s where your code meets the protocol.
TL;DR: Use the mcp fixture from sunpeak/test to call your tools through the running MCP server, verify response shapes, test tool registration, and validate multi-tool workflows. No browser, no iframe, no paid accounts. Run with pnpm test:e2e.
Why Integration Tests Matter for MCP Apps
MCP Apps have a layered architecture. Your tool handler runs server-side, returns structured content through the MCP protocol, and a resource component renders that content client-side in an iframe. Unit tests check each piece in isolation. E2E tests check the whole stack. But neither catches the bugs that happen at the seams.
Here are the kinds of bugs integration tests catch that other test types miss:
- Your tool handler returns
{ items: [...] }but your resource component readsoutput.resultsbecause someone renamed the field in the handler but not the component - Your Zod schema rejects valid input because a field was typed as
z.number()when the host sends it as a string - Your tool annotations are missing
readOnlyHint, which blocks Claude Connector Directory submission - Your tool handler throws an unhandled error that the MCP server wraps in a format your error boundary doesn’t expect
- A multi-tool workflow breaks because tool A’s output shape changed and tool B still expects the old format
Unit tests miss these because they bypass the protocol layer. E2E tests catch some of them, but they’re slow, and when they fail, it’s hard to tell whether the bug is in the handler, the protocol, or the rendering.
The mcp Fixture
The mcp fixture from sunpeak/test gives you protocol-level access to your running MCP server. It starts the dev server, connects via the MCP protocol, and exposes methods for calling tools, listing resources, and reading resource content.
import { test, expect } from 'sunpeak/test';
test('search tool returns results', async ({ mcp }) => {
const result = await mcp.callTool('search', { query: 'react hooks' });
expect(result.isError).toBeFalsy();
expect(result.structuredContent.results).toBeInstanceOf(Array);
expect(result.structuredContent.results.length).toBeGreaterThan(0);
});
This test calls your actual tool handler through the full MCP stack. The JSON-RPC request is serialized, your handler runs, and the response comes back through the same path it would in production. If your handler throws, result.isError will be truthy. If the response shape is wrong, your assertions catch it.
The mcp fixture exposes four methods:
| Method | What it does |
|---|---|
listTools() | Returns all registered tools with schemas and annotations |
callTool(name, input?) | Calls a tool handler through the MCP protocol |
listResources() | Returns all registered resources |
readResource(uri) | Reads a resource by URI |
Testing Tool Registration
Before testing what your tools do, test that they exist. This sounds obvious, but a typo in a file name or a missing export can silently deregister a tool, and you won’t know until a user reports that it stopped working.
import { test, expect } from 'sunpeak/test';
test('all expected tools are registered', async ({ mcp }) => {
const tools = await mcp.listTools();
const toolNames = tools.map(t => t.name);
expect(toolNames).toContain('show-dashboard');
expect(toolNames).toContain('search-products');
expect(toolNames).toContain('create-order');
});
You can also verify schemas and annotations:
test('create-order has correct schema and annotations', async ({ mcp }) => {
const tools = await mcp.listTools();
const createOrder = tools.find(t => t.name === 'create-order');
// Schema has required fields
expect(createOrder.inputSchema.properties).toHaveProperty('productId');
expect(createOrder.inputSchema.properties).toHaveProperty('quantity');
expect(createOrder.inputSchema.required).toContain('productId');
// Annotations mark it as destructive
expect(createOrder.annotations?.destructiveHint).toBe(true);
expect(createOrder.annotations?.readOnlyHint).toBeFalsy();
});
This test costs almost nothing to run and catches real problems. If you’re building a Claude Connector, the Connectors Directory requires every tool to have either readOnlyHint: true or destructiveHint: true. A test like this saves you from a rejected submission.
Contract Testing Between Tool Handlers and Resource Components
The most common integration bug in MCP Apps is a shape mismatch between what the tool handler returns and what the resource component expects. The tool handler returns structuredContent with a certain shape, and the resource component reads that shape via useToolData(). If the shapes drift apart, the UI breaks.
Integration tests verify this contract by calling the tool and asserting on the exact fields your resource component uses:
test('show-dashboard returns the shape the resource component expects', async ({ mcp }) => {
const result = await mcp.callTool('show-dashboard', {
quarter: 'Q1',
year: 2026,
});
const data = result.structuredContent;
// These are the exact fields DashboardResource reads from useToolData().output
expect(data).toHaveProperty('revenue');
expect(data).toHaveProperty('orders');
expect(data).toHaveProperty('topProducts');
expect(typeof data.revenue).toBe('number');
expect(typeof data.orders).toBe('number');
expect(data.topProducts).toBeInstanceOf(Array);
// Each product has the fields the component maps over
const product = data.topProducts[0];
expect(product).toHaveProperty('name');
expect(product).toHaveProperty('unitsSold');
expect(product).toHaveProperty('revenue');
});
When your resource component reads output.topProducts.map(p => p.name) and someone renames the field to productName in the tool handler, this test breaks immediately. Without it, you’d find out from a blank screen in production.
Some teams formalize this by defining the contract as a TypeScript type and using it in both the tool handler and the resource component (see MCP App TypeScript types for the full pattern). Integration tests then verify the runtime output matches the compile-time type:
import type { DashboardOutput } from '../../src/resources/dashboard/types';
test('show-dashboard output satisfies DashboardOutput type', async ({ mcp }) => {
const result = await mcp.callTool('show-dashboard', {
quarter: 'Q1',
year: 2026,
});
// Runtime validation of the contract
const data = result.structuredContent as DashboardOutput;
expect(data.revenue).toBeDefined();
expect(data.orders).toBeDefined();
expect(data.topProducts).toBeDefined();
});
Testing Error Handling at the Protocol Level
When a tool handler throws, the MCP protocol wraps the error and returns it to the host. Your resource component should handle this gracefully (see MCP App error handling). Integration tests verify the error path end-to-end without rendering:
test('tool returns error for invalid input', async ({ mcp }) => {
const result = await mcp.callTool('show-dashboard', {
quarter: 'invalid',
year: -1,
});
expect(result.isError).toBeTruthy();
});
test('tool returns error when upstream API is down', async ({ mcp }) => {
// If your tool calls an external API, you can test
// what happens when it fails by passing input that
// triggers the error path in your handler
const result = await mcp.callTool('search-products', {
query: '',
});
// Depending on your handler logic, this might be an error
// or an empty result set. Either way, it shouldn't crash.
if (result.isError) {
expect(result.content[0].text).toBeTruthy();
} else {
expect(result.structuredContent.results).toEqual([]);
}
});
The goal is to verify that your tool handler never returns a raw exception stack trace or an empty response. Every error path should produce a usable error message that the host can display.
Testing Multi-Tool Workflows
Some MCP Apps use multiple tools that depend on each other. A search tool returns a list of items, a details tool shows one item, and an action tool modifies it. If the output shape of one tool changes, the downstream tools break.
Integration tests verify these workflows by chaining callTool calls:
test('search → detail → action workflow', async ({ mcp }) => {
// Step 1: Search returns a list with IDs
const searchResult = await mcp.callTool('search-products', {
query: 'wireless headphones',
});
expect(searchResult.isError).toBeFalsy();
const products = searchResult.structuredContent.results;
expect(products.length).toBeGreaterThan(0);
const firstProductId = products[0].id;
expect(firstProductId).toBeTruthy();
// Step 2: Detail accepts the ID from search
const detailResult = await mcp.callTool('product-detail', {
productId: firstProductId,
});
expect(detailResult.isError).toBeFalsy();
expect(detailResult.structuredContent.name).toBeTruthy();
expect(detailResult.structuredContent.price).toBeDefined();
// Step 3: Action uses the same ID
const orderResult = await mcp.callTool('create-order', {
productId: firstProductId,
quantity: 1,
});
expect(orderResult.isError).toBeFalsy();
expect(orderResult.structuredContent.orderId).toBeTruthy();
});
This test exercises the contract between three tools. If search-products renames its id field to productId, the detail and action steps fail, and you know exactly where the break is.
Handling External Dependencies
Integration tests run your real tool handler code. If your handler calls an external API or database, you need to decide how to handle that dependency.
There are two approaches:
Mock at the boundary. Replace your API client with a mock that returns controlled data. Your tool handler logic runs for real, but the external call is faked. This keeps tests fast and deterministic.
// In your test setup or vitest.config.ts
vi.mock('../../src/lib/api-client', () => ({
fetchProducts: vi.fn().mockResolvedValue([
{ id: '1', name: 'Wireless Headphones', price: 79.99 },
{ id: '2', name: 'Bluetooth Speaker', price: 49.99 },
]),
}));
Your tool handler imports fetchProducts and uses it normally, so the test exercises all the handler logic between the API call and the response.
Use a test database or API. For handlers that do complex database queries, running against a test database catches query bugs that mocks miss. Set up a test database in your CI environment and point your handler at it via environment variables.
Most teams use mocked dependencies for CI and real dependencies for staging. The mcp fixture doesn’t care either way. It calls your tool through the protocol, and your handler does whatever it does internally.
Where Integration Tests Fit in Your Test Suite
Integration tests sit between unit and e2e tests in your testing pyramid:
- Unit tests (
pnpm test:unit): Fast, isolated, mock everything. Test component rendering logic and utility functions. Run in milliseconds. - Integration tests (
pnpm test:e2e): Medium speed, real protocol. Test tool handlers through the MCP server, verify response shapes, test tool registration. Run in seconds. - E2E tests (
pnpm test:e2e): Full browser, real rendering. Test the complete ChatGPT App or Claude Connector in the inspector with theme, display mode, and host variations. Run in seconds to minutes. - Visual regression tests (
pnpm test:visual): Screenshot comparison on top of e2e. Catch CSS and layout bugs. Run in seconds to minutes. - Live tests (
pnpm test:live): Real host validation against ChatGPT or Claude. Expensive, slow. Reserve for pre-release. Run in minutes.
Integration tests and e2e tests both live in tests/e2e/ and both use pnpm test:e2e. The difference is which fixture you use. Tests with the mcp fixture are integration tests (protocol-level, no rendering). Tests with the inspector fixture are e2e tests (full rendering in a browser). You can organize them into subdirectories if you want to run them separately:
tests/
e2e/
integration/
tools.spec.ts # mcp fixture, protocol-level
workflows.spec.ts # mcp fixture, multi-tool chains
rendering/
dashboard.spec.ts # inspector fixture, UI rendering
themes.spec.ts # inspector fixture, theme testing
Run just integration tests:
pnpm test:e2e tests/e2e/integration/
A Complete Integration Test File
Here’s a full integration test file for a product catalog MCP App:
import { test, expect } from 'sunpeak/test';
test.describe('tool registration', () => {
test('all tools are registered with correct annotations', async ({ mcp }) => {
const tools = await mcp.listTools();
const toolMap = Object.fromEntries(tools.map(t => [t.name, t]));
// Read-only tools
expect(toolMap['search-products'].annotations?.readOnlyHint).toBe(true);
expect(toolMap['product-detail'].annotations?.readOnlyHint).toBe(true);
// Destructive tools
expect(toolMap['create-order'].annotations?.destructiveHint).toBe(true);
});
test('search-products has correct input schema', async ({ mcp }) => {
const tools = await mcp.listTools();
const search = tools.find(t => t.name === 'search-products');
expect(search.inputSchema.properties.query.type).toBe('string');
expect(search.inputSchema.properties.category?.type).toBe('string');
expect(search.inputSchema.required).toContain('query');
});
});
test.describe('tool handler contracts', () => {
test('search-products returns expected shape', async ({ mcp }) => {
const result = await mcp.callTool('search-products', {
query: 'headphones',
});
expect(result.isError).toBeFalsy();
const data = result.structuredContent;
expect(data.results).toBeInstanceOf(Array);
if (data.results.length > 0) {
const item = data.results[0];
expect(item).toHaveProperty('id');
expect(item).toHaveProperty('name');
expect(item).toHaveProperty('price');
}
});
test('product-detail returns expected shape', async ({ mcp }) => {
const result = await mcp.callTool('product-detail', {
productId: 'test-product-1',
});
expect(result.isError).toBeFalsy();
const data = result.structuredContent;
expect(data).toHaveProperty('name');
expect(data).toHaveProperty('description');
expect(data).toHaveProperty('price');
expect(data).toHaveProperty('images');
});
});
test.describe('error handling', () => {
test('search with empty query returns empty results', async ({ mcp }) => {
const result = await mcp.callTool('search-products', { query: '' });
if (result.isError) {
expect(result.content[0].text).toBeTruthy();
} else {
expect(result.structuredContent.results).toEqual([]);
}
});
test('detail with invalid ID returns error', async ({ mcp }) => {
const result = await mcp.callTool('product-detail', {
productId: 'nonexistent',
});
expect(result.isError).toBeTruthy();
});
});
test.describe('multi-tool workflows', () => {
test('search result IDs work with product-detail', async ({ mcp }) => {
const searchResult = await mcp.callTool('search-products', {
query: 'headphones',
});
const products = searchResult.structuredContent.results;
if (products.length === 0) return;
const detailResult = await mcp.callTool('product-detail', {
productId: products[0].id,
});
expect(detailResult.isError).toBeFalsy();
expect(detailResult.structuredContent.name).toBeTruthy();
});
});
This file covers tool registration, response contracts, error paths, and cross-tool workflows, all without opening a browser. Tests run in seconds and work in CI without any external dependencies.
Get Started
Integration testing for MCP Apps doesn’t require a new framework or a complex setup. If you’re already using sunpeak, the mcp fixture is available right now in sunpeak/test.
If you have an existing MCP server that isn’t built with sunpeak, you can still use the testing framework. Run npx sunpeak test init to scaffold test infrastructure, point the config at your server, and start writing tests with the mcp fixture. It works with any MCP server, whether it’s TypeScript, Python, or anything else that speaks MCP.
# Initialize testing for an existing MCP server
npx sunpeak test init
# Run integration tests
pnpm test:e2e
Check out the testing framework documentation for the full API reference, or read the complete testing guide for context on how integration tests fit into a full MCP App testing strategy.
Get Started
npx sunpeak new
Further Reading
- Complete guide to testing ChatGPT Apps and MCP Apps
- Mocking and stubbing in MCP App tests
- Snapshot testing MCP Apps
- How to test Claude Connectors
- MCP App CI/CD with GitHub Actions
- Visual regression testing for MCP Apps
- MCP App error handling
- MCP App framework
- ChatGPT App framework
- Claude Connector framework
- Testing framework
Frequently Asked Questions
What is integration testing for MCP Apps?
Integration testing for MCP Apps verifies that your tool handlers, MCP protocol layer, and resource components work together correctly. Unlike unit tests (which mock everything) or e2e tests (which test the full rendered UI in a browser), integration tests exercise the real tool handler code, call it through the MCP protocol, and verify the response shape and data without rendering in an iframe. This catches bugs at the boundaries between layers, like mismatched field names or broken serialization.
How do I integration test an MCP App tool handler?
Use the mcp fixture from sunpeak/test. Call mcp.callTool("tool-name", { args }) to invoke your tool handler through the full MCP protocol stack. Assert on the returned result object, checking structuredContent fields, content arrays, and isError. This tests your handler with the real MCP server running, not a mock.
What is the difference between unit testing and integration testing MCP Apps?
Unit tests import your tool handler function directly and call it with mock arguments, bypassing the MCP protocol entirely. Integration tests use the mcp fixture to call your tool through the running MCP server, exercising JSON-RPC serialization, argument validation, and the full request lifecycle. Unit tests are faster but miss protocol-level bugs. Integration tests are slower but catch real-world failures.
How do I test that my MCP App tools are registered correctly?
Use mcp.listTools() from the mcp fixture to get all registered tools. Assert that your expected tools exist, have the correct input schemas, descriptions, and annotations. This catches missing tool registrations, wrong schema types, and missing annotations like readOnlyHint before you deploy.
Can I integration test MCP Apps without a ChatGPT or Claude account?
Yes. Integration tests with the mcp fixture run against the local sunpeak dev server. No paid subscriptions, no API keys, and no AI credits required. The mcp fixture starts the server automatically and tears it down after tests complete. Tests run the same way locally and in CI/CD.
How do I test multi-tool MCP App workflows?
Call mcp.callTool() multiple times in sequence within a single test. Use the output from one tool call as the input to the next. This verifies that your tools produce data in the format other tools expect, which catches contract mismatches that unit tests with mocked data would miss.
Should I mock external APIs in MCP App integration tests?
It depends on what you are testing. If you want to verify the contract between your tool handler and the MCP protocol layer, mock external APIs so tests stay fast and deterministic. If you want to verify your tool works with the real API, skip the mock but accept that tests will be slower and can fail due to network issues. Most teams mock external APIs in CI and run unmocked integration tests in a staging environment.
How do I run only integration tests for my MCP App?
Put integration test files in tests/e2e/ and use the mcp fixture instead of the inspector fixture. Run them with pnpm test:e2e. If you want to separate integration tests from rendering tests, put them in a subdirectory like tests/e2e/integration/ and run "pnpm test:e2e tests/e2e/integration/" to target just those files.