How to Test Claude Connectors: Unit Tests, Local Inspector, and CI/CD (April 2026)

April 9, 2026 Abe Wheeler

Claude Connectors Claude Connector Testing Claude Connector Framework MCP Apps MCP App Testing Claude Apps

Testing Claude Connectors locally with the sunpeak inspector.

TL;DR: Test Claude Connectors locally without a Claude account using sunpeak’s inspector (pnpm dev). Run pnpm test to execute both unit and e2e tests, or use pnpm test:unit / pnpm test:e2e to run them separately. Add pnpm test:visual for visual regression tests, or pnpm test:eval for multi-model evals that test tool calling across GPT-4o, Claude, Gemini, and other LLMs. Use simulation files for deterministic edge-case coverage. Run the same tests in GitHub Actions CI/CD. Save live Claude testing for pre-release validation only.

Testing Claude Connectors by hand is slow and expensive. Every test cycle means opening Claude, typing a prompt, waiting for the model to respond, checking the result, and doing it again when something breaks. If your connector renders UI, you also need to verify the iframe loads, the data displays correctly, and the component handles edge cases. A Claude Pro subscription costs $20/month per team member, and every test burns AI credits.

There is a better way. This post covers how to test Claude Connectors at every stage of development: unit tests for tool handlers, local inspector tests for UI rendering, simulation files for edge cases, e2e tests with the inspector fixture, multi-model evals, and CI/CD for automated regression testing. All of it runs locally and in your pipeline without a Claude account.

The Claude Connector Testing Pyramid

Think of Claude Connector testing in four layers:

Unit tests (fast, cheap, run in milliseconds). Test tool handlers, schemas, annotations, and utility functions in isolation with Vitest.
Inspector tests (medium speed, full UI). Test your complete connector in sunpeak’s local inspector, which replicates the Claude runtime. Simulation files give you deterministic data. The inspector fixture automates these tests.
Evals (medium speed, costs API credits). Test whether GPT-4o, Claude, Gemini, and other models call your tools correctly. Each eval case runs multiple times per model to measure reliability. Run with pnpm test:eval.
Live tests (slow, requires accounts). Test against the real Claude for final validation before shipping. Reserve this for pre-release checks.

Most of your testing should happen in layers 1 and 2. Layer 3 catches tool description ambiguity across models. Layer 4 is a safety net, not a daily workflow.

Unit Testing Tool Handlers

Your tool handler is a function that takes arguments and returns content. Test it like any other function.

// tests/tools/search-tickets.test.ts

import { describe, it, expect, vi } from 'vitest';
import handler from '../../src/tools/search-tickets';

// Mock your data layer
vi.mock('../../src/lib/api', () => ({
  searchTickets: vi.fn().mockResolvedValue([
    { id: 'TICK-1', title: 'Login broken', status: 'open', priority: 'high' },
    { id: 'TICK-2', title: 'Slow dashboard', status: 'in_progress', priority: 'medium' },
  ]),
}));

describe('search-tickets handler', () => {
  it('returns structuredContent with matching tickets', async () => {
    const result = await handler(
      { query: 'login', status: 'open' },
      {} as any
    );

    expect(result.structuredContent).toBeDefined();
    expect(result.structuredContent.tickets).toHaveLength(2);
    expect(result.structuredContent.tickets[0].id).toBe('TICK-1');
  });

  it('handles empty results', async () => {
    const { searchTickets } = await import('../../src/lib/api');
    (searchTickets as any).mockResolvedValueOnce([]);

    const result = await handler({ query: 'nonexistent' }, {} as any);
    expect(result.structuredContent.tickets).toHaveLength(0);
  });
});

Run with pnpm test:unit. These tests finish in under a second because there is no browser, no server, and no network.

What to Unit Test

Focus on the logic your handler contains:

Return shape. Does the handler return structuredContent or content with the right fields? Your resource component will break silently if the data shape is wrong.
Input handling. Does the handler use defaults for optional parameters? Does it validate inputs before making API calls?
Error paths. What happens when your external API returns a 500? When the database query times out? When the user passes an ID that does not exist?
Data transformation. If you are transforming API responses (and you should), test that the transformation produces the right shape.

Unit Testing Tool Configs and Annotations

Tool annotations are required for Connectors Directory submission, and missing annotations cause 30% of rejections. A quick unit test catches this before you deploy:

// tests/tools/annotations.test.ts

import { describe, it, expect } from 'vitest';
import { tool as searchTool } from '../../src/tools/search-tickets';
import { tool as updateTool } from '../../src/tools/update-ticket-status';

describe('tool annotations', () => {
  it('search tool is marked read-only', () => {
    expect(searchTool.annotations?.readOnlyHint).toBe(true);
  });

  it('update tool is marked destructive', () => {
    expect(updateTool.annotations?.destructiveHint).toBe(true);
  });
});

You can also test that every tool file in your src/tools/ directory has annotations:

// tests/tools/all-annotations.test.ts

import { describe, it, expect } from 'vitest';
import { readdirSync } from 'fs';
import { join } from 'path';

const toolsDir = join(__dirname, '../../src/tools');
const toolFiles = readdirSync(toolsDir).filter((f) => f.endsWith('.ts'));

describe('all tools have annotations', () => {
  toolFiles.forEach((file) => {
    it(`${file} has readOnlyHint or destructiveHint`, async () => {
      const mod = await import(join(toolsDir, file));
      const annotations = mod.tool?.annotations;
      expect(annotations).toBeDefined();

      const hasHint =
        annotations?.readOnlyHint === true ||
        annotations?.destructiveHint === true;
      expect(hasHint).toBe(true);
    });
  });
});

This test auto-discovers tool files, so it catches new tools that ship without annotations. Add it once and forget about it.

Testing with the Local Inspector

Unit tests cover logic. The inspector covers rendering. Run pnpm dev and you get a local Claude replica at localhost:3000 that loads your connector’s tools and resources without any network calls or Claude account.

pnpm dev

Select Claude from the Host dropdown in the inspector sidebar. Your tools appear in the tool list. Click a tool, provide mock input, and see your resource component render with real data.

This is where simulation files come in.

Simulation Files

Simulation files are JSON files that define deterministic tool states. Each file specifies a title and the output data your resource component will receive:

// src/resources/ticket-list/simulations/open-tickets.json
{
  "title": "Three open tickets",
  "output": {
    "tickets": [
      { "id": "TICK-1", "title": "Login page 500 error", "status": "open", "priority": "high" },
      { "id": "TICK-2", "title": "Dashboard load time", "status": "open", "priority": "medium" },
      { "id": "TICK-3", "title": "Email notifications delayed", "status": "open", "priority": "low" }
    ]
  }
}

// src/resources/ticket-list/simulations/empty-results.json
{
  "title": "No matching tickets",
  "output": {
    "tickets": []
  }
}

// src/resources/ticket-list/simulations/long-list.json
{
  "title": "20 tickets with pagination",
  "output": {
    "tickets": [
      { "id": "TICK-1", "title": "Issue one", "status": "open", "priority": "high" },
      { "id": "TICK-2", "title": "Issue two", "status": "closed", "priority": "low" }
    ],
    "nextCursor": "abc123",
    "totalCount": 20
  }
}

The inspector auto-discovers these files and lets you switch between them in the sidebar. You see exactly what your users see when Claude returns each tool response.

Edge Cases to Cover with Simulations

Create simulations for the states that break UIs:

Empty data. Empty arrays, null fields, zero values. Does your component show a helpful empty state or crash?
Long strings. Ticket titles with 200 characters, descriptions with paragraphs of text. Does your layout overflow or truncate?
Missing optional fields. If assignee is optional in your tool schema, what happens when the tool result omits it?
Single item vs many items. A list with one item and a list with 50 items look different. Test both.
Error states. Tool results with an error field or unexpected shapes. Your component should fail gracefully.

E2E Testing with the inspector Fixture

The inspector fixture from sunpeak/test automates what you do manually in the inspector. It starts the dev server, calls your tools, renders the result in a simulated host, and gives you a frame locator scoped to your resource component.

// tests/e2e/ticket-list.spec.ts

import { test, expect } from 'sunpeak/test';

test('ticket list renders open tickets', async ({ inspector }) => {
  const result = await inspector.renderTool('search-tickets');
  const app = result.app();

  const tickets = app.locator('[data-testid="ticket-row"]');
  await expect(tickets).toHaveCount(3);
  await expect(tickets.first()).toContainText('Login page 500 error');
  await expect(tickets.first()).toContainText('high');
});

test('ticket list shows empty state', async ({ inspector }) => {
  const result = await inspector.renderTool('search-tickets');
  const app = result.app();
  await expect(app.locator('text=No tickets found')).toBeVisible();
});

Run with:

pnpm test

Tests run against both ChatGPT and Claude hosts automatically via Playwright projects. No manual host looping required.

Testing Display Modes

Claude renders connector UI in different display modes: inline (embedded in the chat), fullscreen (expanded view), and pip (picture-in-picture). Your component should look right in all of them:

// tests/e2e/display-modes.spec.ts

import { test, expect } from 'sunpeak/test';

const modes = ['inline', 'fullscreen', 'pip'] as const;

for (const displayMode of modes) {
  test(`dashboard renders in ${displayMode} mode`, async ({ inspector }) => {
    const result = await inspector.renderTool('get-metrics', undefined, { displayMode });
    const app = result.app();
    await expect(app.locator('text=Page Views')).toBeVisible();
  });
}

Running Tests in CI/CD

Add both unit tests and e2e tests to your GitHub Actions workflow:

# .github/workflows/test.yml

name: Test Claude Connector
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install

      # Install Playwright browsers
      - run: pnpm exec playwright install --with-deps chromium

      # Unit + E2E tests
      - run: pnpm test

Every push runs both unit and e2e tests. No Claude account, no API keys, no AI credits on your CI runners. If a tool handler breaks, a resource component crashes on empty data, or an annotation goes missing, the pipeline catches it.

For a deeper look at CI/CD configuration, see the MCP App GitHub Actions guide.

Live Testing Against Real Claude

After your connector passes local and CI tests, you can run a final round of tests against the real Claude runtime. This validates things the local inspector cannot fully replicate: actual LLM tool selection, real OAuth flows, and production-specific session handling.

The live testing guide covers this in detail. The short version: pnpm test:live opens a real Claude conversation, sends a message that should trigger your tool, and asserts on the result:

// tests/live/claude-live.spec.ts

import { test, expect } from 'sunpeak/test/live';

test('Claude calls search-tickets tool', async ({ live }) => {
  const app = await live.invoke('Search for open support tickets');

  await expect(app.locator('[data-testid="ticket-row"]')).toBeVisible({
    timeout: 15_000,
  });
});

This requires a Claude account and burns credits, so run it sparingly. A good workflow: local inspector tests on every commit, live tests on release branches or as a manual CI trigger.

Testing Checklist

Before shipping your Claude Connector, make sure you have covered:

Every tool handler has unit tests for happy path and error paths
Every tool has readOnlyHint or destructiveHint annotations
Resource components have simulation files for empty, single, and many-item states
E2E tests with the inspector fixture load each simulation and check the UI
Tests run on both Claude and ChatGPT hosts (automatic with pnpm test)
Display modes (inline, fullscreen, pip) render without layout breakage
CI/CD runs pnpm test on every push
Token payload from structuredContent stays under 25,000 tokens (test with large simulations)
Tool schemas reject invalid inputs (test with bad arguments in unit tests)

If you are submitting to the Connectors Directory, the annotation and token limit items are hard requirements. Better to catch them in tests than in a rejection email two weeks later.

Get Started

Documentation →


npx sunpeak new

Frequently Asked Questions

How do I test a Claude Connector without a Claude account?

Use sunpeak to run a local inspector that replicates the Claude runtime. Run pnpm dev to start the inspector at localhost:3000, select Claude from the Host dropdown, and test your connector tools and UI locally. No Claude subscription, no network calls, no AI credits burned. Simulation files provide deterministic mock data so your tests produce the same result every time.

What testing frameworks work with Claude Connectors?

sunpeak includes a built-in testing framework. Run "pnpm test" to execute both unit and e2e tests. Use pnpm test:unit or pnpm test:e2e to run them separately. E2E tests use the inspector fixture from sunpeak/test, which calls tools, renders them in simulated hosts, and gives you a frame locator for assertions. Add pnpm test:visual for visual regression tests. Both run locally and in CI/CD.

How do I unit test a Claude Connector tool handler?

Import your tool handler function directly and call it with mock arguments and an extras object. Assert on the returned content or structuredContent. For handlers that call external APIs, mock the fetch calls with vi.fn() or msw. Vitest runs these tests in milliseconds with no server or browser required.

What are simulation files in Claude Connector testing?

Simulation files are JSON files that define deterministic tool states for testing. Each file specifies a title and output (the structuredContent your resource component receives). Place them in your resource directory and the sunpeak inspector auto-discovers them. Use simulations for visual testing during development and as fixtures for e2e tests.

How do I run Claude Connector tests in GitHub Actions?

Add "pnpm test" to your GitHub Actions workflow. It runs both unit and e2e tests. Tests start the sunpeak dev server automatically, run against the local inspector (no Claude account needed), and shut down when complete. The full test suite runs in CI with zero external dependencies.

How do I test Claude Connector annotations like readOnlyHint and destructiveHint?

Import the tool config object from your tool file and assert that the annotations field contains the expected values. Every tool submitted to the Connectors Directory must have either readOnlyHint: true or destructiveHint: true. A unit test that checks annotations catches missing values before you deploy.

Can I test my Claude Connector against the real Claude?

Yes. After local testing, you can run live tests against the real Claude runtime with pnpm test:live. The sunpeak live testing setup uses Playwright to open a real Claude conversation, trigger your connector tools, and assert on the rendered UI. This requires a Claude account and costs AI credits, so reserve it for pre-release validation rather than everyday development.

How do I test Claude Connector error handling?

Create simulation files with edge-case data: empty arrays, null fields, very long strings, missing optional fields. Write e2e tests that load these simulations and verify your resource component handles them gracefully. For tool handler errors, unit test that your handler returns meaningful error messages when external APIs fail or input validation catches bad arguments.