Pre-Submission Testing for MCP Apps: Validate Before Publishing to ChatGPT and Claude (April 2026)

April 24, 2026 Abe Wheeler

MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing ChatGPT App Framework Claude Connectors Claude Connector Testing Claude Connector Framework Pre-Submission Testing App Store

Testing your MCP App before submitting to the ChatGPT App Store and Claude Connectors Directory.

You built an MCP App. It works in your local inspector. Now you want to get it into the ChatGPT App Store or the Claude Connectors Directory so users can install it with one click. Both platforms review submissions manually, and both reject apps for fixable issues that testing would have caught. This guide covers what to test before you submit.

TL;DR: Tool annotation errors are the #1 rejection reason on both platforms. Every tool needs readOnlyHint, destructiveHint, and openWorldHint set correctly. Beyond annotations, test your credentials, audit data collection, verify display modes, check error handling, validate CSP, and test on mobile. Use sunpeak to run your test suite against both ChatGPT and Claude hosts locally before submitting.

Why Pre-Submission Testing Matters

Both ChatGPT and Claude review submissions manually. ChatGPT typically responds within two business days, though longer waits happen during busy periods. Claude’s review takes roughly two weeks. If your app gets rejected, you fix the issue, resubmit, and wait again. Two rounds of rejection on Claude can cost you a month.

Most rejections come from a short list of fixable problems: wrong annotations, broken credentials, excessive data collection, and missing privacy policies. You can catch all of these with automated tests before you ever hit “Submit.”

What Both Platforms Check

ChatGPT and Claude have different submission forms, but their review criteria overlap:

Tool annotations match actual behavior (read-only tools are marked read-only, destructive tools are marked destructive)
Test credentials work without 2FA and include sample data
Privacy policy is published and covers what your app collects
Tool descriptions are accurate and not promotional
Error handling is clean with no crashes or hangs
App is complete, not a demo or trial

ChatGPT adds a few requirements Claude does not: openWorldHint annotations for tools that touch external systems, a ban on digital product sales and advertising, mandatory mobile testing, and stricter CSP review for apps using frameDomains. Claude requires Streamable HTTP transport, both claude.ai and claude.com OAuth callback URLs, tool results under 25,000 tokens, and tool handlers completing within five minutes.

Tool Annotation Testing

Incorrect or missing tool annotations cause more rejections than any other issue. Both platforms require annotations that tell the host whether a tool reads data or changes something.

The three annotations that matter:

Annotation	When to use	Example
`readOnlyHint: true`	Tool only retrieves or lists data	`get_order_status`, `search_products`, `list_users`
`destructiveHint: true`	Tool creates, updates, deletes, or sends	`delete_account`, `send_email`, `update_profile`
`openWorldHint: true`	Tool interacts with external systems or creates public content	`post_to_twitter`, `upload_to_s3`, `send_slack_message`

A tool can have multiple annotations. A tool that deletes a user’s public post would set both destructiveHint: true and openWorldHint: true.

In a sunpeak project, annotations go in your tool config:

import type { AppToolConfig } from 'sunpeak/mcp';

export const tool: AppToolConfig = {
  resource: 'order-status',
  title: 'Get Order Status',
  description: 'Look up the current status of an order by order ID',
  annotations: {
    readOnlyHint: true,
    destructiveHint: false,
    openWorldHint: false,
  },
};

Automated Annotation Tests

Write integration tests that verify annotations are correct. Use the mcp fixture from sunpeak/test to list your tools and assert against their annotations:

import { test, expect } from 'sunpeak/test';

test('all tools have annotations', async ({ mcp }) => {
  const { tools } = await mcp.listTools();
  for (const tool of tools) {
    expect(tool.annotations, `${tool.name} missing annotations`).toBeDefined();
  }
});

test('read-only tools are annotated correctly', async ({ mcp }) => {
  const { tools } = await mcp.listTools();
  const readTools = tools.filter((t) =>
    t.name.startsWith('get_') || t.name.startsWith('search_') || t.name.startsWith('list_')
  );
  for (const tool of readTools) {
    expect(tool.annotations?.readOnlyHint, `${tool.name} should be read-only`).toBe(true);
  }
});

test('destructive tools are annotated correctly', async ({ mcp }) => {
  const { tools } = await mcp.listTools();
  const writeTools = tools.filter((t) =>
    t.name.startsWith('delete_') || t.name.startsWith('update_') || t.name.startsWith('send_')
  );
  for (const tool of writeTools) {
    expect(tool.annotations?.destructiveHint, `${tool.name} should be destructive`).toBe(true);
  }
});

These tests run in CI/CD so you catch annotation regressions automatically. If someone adds a new tool without annotations, the test fails before the code ships.

Test Credential Verification

Both platforms require working test credentials so reviewers can exercise your app. Broken credentials are the second most common rejection reason on ChatGPT.

What to check:

Demo account exists with a dedicated username and password
2FA is disabled on the demo account (reviewers cannot pass 2FA challenges)
Sample data is populated covering all tool endpoints
Credentials are not expired (set a calendar reminder to rotate them before they expire)
Account works outside your network (no VPN, IP allowlist, or corporate SSO required)

Write a test that verifies your auth flow works end to end:

test('auth flow completes with test credentials', async ({ mcp }) => {
  // If your app uses OAuth, verify the flow completes
  const result = await mcp.callTool('get_profile', {});
  expect(result.isError).toBeFalsy();
  expect(result.content).toBeDefined();
});

If your app does not require authentication, you still need to verify that tools work without auth context and return useful results.

Privacy and Data Collection Testing

Both platforms reject apps that collect more data than they need. OpenAI’s submission guidelines are explicit: “Gather only the minimum data required to perform the tool’s function.”

Input Minimization

Audit every tool’s input schema. Each field should be directly necessary for the tool’s task. Red flags:

A tool that sends one email but asks for the user’s full contact list
Location data fields when the tool does not need location
Generic “context” or “additional_info” fields
Fields requesting full conversation history or raw transcripts

test('tools do not request excessive input fields', async ({ mcp }) => {
  const { tools } = await mcp.listTools();
  for (const tool of tools) {
    const schema = tool.inputSchema;
    if (schema?.properties) {
      const fields = Object.keys(schema.properties);
      // Flag tools with suspiciously many required fields
      const required = schema.required || [];
      expect(
        required.length,
        `${tool.name} requires ${required.length} fields, which may be excessive`
      ).toBeLessThan(10);
    }
  }
});

Response Minimization

Tool responses must not include diagnostic data, telemetry, internal identifiers, session IDs, trace IDs, request IDs, timestamps, or logging metadata unless the user explicitly asked for it. Both platforms check this during review.

test('tool responses do not leak internal metadata', async ({ mcp }) => {
  const result = await mcp.callTool('get_order', { orderId: 'test-123' });
  const responseText = JSON.stringify(result.content);
  expect(responseText).not.toMatch(/session[_-]?id/i);
  expect(responseText).not.toMatch(/trace[_-]?id/i);
  expect(responseText).not.toMatch(/request[_-]?id/i);
});

Restricted Data

Your app cannot collect these through tool inputs under any circumstances:

Payment card information (PCI DSS scope)
Protected health information (PHI)
Government identifiers (social security numbers, passport numbers)
Authentication secrets (API keys, passwords, MFA codes)

Display Mode and Rendering Testing

Your MCP App needs to render correctly in every display mode it supports. ChatGPT has three display modes (inline, picture-in-picture, and fullscreen), each with different viewport dimensions and host chrome. If your layout breaks in any mode, reviewers will catch it.

Write E2E tests that cover each display mode:

import { test, expect } from 'sunpeak/test';

const displayModes = ['inline', 'pip', 'fullscreen'] as const;

for (const mode of displayModes) {
  test(`renders correctly in ${mode} mode`, async ({ inspector }) => {
    const result = await inspector.renderTool('show-dashboard', undefined, {
      displayMode: mode,
    });
    const app = result.app();
    await expect(app.locator('[data-testid="dashboard"]')).toBeVisible();
    // Verify no horizontal overflow
    const overflow = await app.evaluate(() => {
      return document.documentElement.scrollWidth > document.documentElement.clientWidth;
    });
    expect(overflow, `horizontal overflow in ${mode} mode`).toBe(false);
  });
}

Theme Testing

Both ChatGPT and Claude support light and dark themes. Your app should work in both without broken contrast, invisible text, or missing borders. If you are using host CSS variables for theming, test that your components look right in each theme.

for (const theme of ['light', 'dark'] as const) {
  test(`renders in ${theme} theme`, async ({ inspector }) => {
    const result = await inspector.renderTool('show-dashboard', undefined, { theme });
    const app = result.app();
    await expect(app.locator('[data-testid="dashboard"]')).toBeVisible();
  });
}

Cross-Host Testing

An app that works in ChatGPT might break in Claude, or the other way around. The hosts have different padding, fonts, color tokens, and iframe sandboxing behavior. If you are submitting to both platforms, you need to test against both.

With sunpeak, E2E tests run against both ChatGPT and Claude hosts automatically. The defineConfig() from sunpeak/test/config creates separate Playwright projects for each host, so every test runs once per host without any extra code. When a test fails on Claude but passes on ChatGPT, the test report shows which host had the problem.

# Run all E2E tests against both hosts
pnpm test:e2e

If you are not using sunpeak, you need to manually verify your app in each host. Connect to ChatGPT via Developer Mode and to Claude by adding your server as a custom connector. Test every tool in both environments and compare the rendering.

Error Handling Testing

OpenAI’s guidelines require that “errors must be handled with clear messaging or fallback behaviors.” No crashes, hangs, or inconsistent behavior. Both platforms will reject apps that show broken states to users.

Test these error scenarios:

Network failures: What happens when your backend is unreachable?
Invalid input: What happens when a tool receives unexpected data?
Empty results: What happens when a search returns no results?
Timeouts: What happens when a tool handler takes too long? (Claude requires completion within five minutes)
Auth failures: What happens when the user’s session expires?

test('handles empty search results gracefully', async ({ inspector }) => {
  const result = await inspector.renderTool('search-products', {
    query: 'xyznonexistent12345',
  });
  const app = result.app();
  await expect(app.locator('text=No results found')).toBeVisible();
});

CSP Configuration Testing

Both platforms enforce Content Security Policy on MCP App resources. Your _meta.ui.csp configuration needs to include every external domain your app contacts. If your app makes API calls to external services, those domains must be listed in connectDomains. If you embed external content, it goes in resourceDomains or frameDomains.

ChatGPT applies extra scrutiny to apps using frameDomains (embedding third-party iframes). OpenAI strongly encourages building without this pattern and notes that “apps using iframes receive extra manual review” and are “often not approved for broad distribution.”

Test that your CSP is correctly configured by verifying that external requests succeed:

test('external API calls work with CSP', async ({ inspector }) => {
  const result = await inspector.renderTool('show-weather', { city: 'Seattle' });
  const app = result.app();
  // If CSP blocks the request, this will show an error state instead
  await expect(app.locator('[data-testid="temperature"]')).toBeVisible();
});

Mobile Testing

ChatGPT requires that “all test cases pass on both ChatGPT web and mobile apps.” This means your MCP App needs to work at mobile viewport widths with touch interactions.

Test at mobile sizes using visual regression tests or by resizing the viewport in E2E tests:

test('renders at mobile width', async ({ inspector, page }) => {
  await page.setViewportSize({ width: 375, height: 812 });
  const result = await inspector.renderTool('show-dashboard');
  const app = result.app();
  await expect(app.locator('[data-testid="dashboard"]')).toBeVisible();
  // Verify no horizontal scroll
  const overflow = await app.evaluate(() => {
    return document.documentElement.scrollWidth > document.documentElement.clientWidth;
  });
  expect(overflow).toBe(false);
});

Things that break on mobile:

Fixed-width layouts that overflow at 375px
Hover-dependent interactions with no touch alternative
Small tap targets (buttons under 44x44px)
Horizontal tables that do not scroll or stack

The Pre-Submission Checklist

Run through this list before you submit to either platform. Every item maps to a real rejection reason.

Tool annotations

Every tool has readOnlyHint set
Every tool has destructiveHint set
Every tool that touches external systems has openWorldHint set (ChatGPT)
Annotation values match actual tool behavior
Automated tests verify annotations in CI

Test credentials

Demo account exists with username and password documented
2FA is disabled on the demo account
Sample data covers all tool endpoints
Credentials work from outside your network
Credentials are not expired

Privacy and data

Privacy policy is published at a public URL
Privacy policy covers data collection, usage, sharing, and user controls
Tool inputs request only data necessary for the task
Tool responses do not leak internal metadata
No restricted data collected (PCI, PHI, government IDs, auth secrets)

Tool quality

Tool names use verb-object format (get_order_status, not orderTool)
Tool descriptions are accurate and not promotional
Tool names are unique within the app
Every tool endpoint works and returns relevant results

Rendering

App renders in inline, pip, and fullscreen display modes
App renders in light and dark themes
App renders on mobile viewports (375px width)
No horizontal overflow in any mode
Interactive elements have touch-friendly tap targets

Cross-host (if submitting to both)

App renders correctly in ChatGPT host
App renders correctly in Claude host
OAuth callback URLs include both claude.ai and claude.com (Claude)

Error handling

Network failures show a clear error message
Invalid input does not crash the app
Empty results show a helpful empty state
Tool handlers complete within five minutes (Claude)
Tool results are under 25,000 tokens (Claude)

Infrastructure

MCP server is hosted on a publicly accessible domain
CSP configuration includes all external domains
Server uses Streamable HTTP transport (Claude)
No testing endpoints, staging URLs, or localhost references

Compliance

App does not sell digital products (ChatGPT)
App does not serve advertisements (ChatGPT)
App does not scrape third-party services without authorization
App does not disparage competitors in descriptions or system prompts

Run Your Tests

With sunpeak, you can validate most of this checklist automatically:

# Run unit tests, integration tests, and E2E tests against both hosts
pnpm test

# Run visual regression tests
pnpm test:visual

# Run live tests against real ChatGPT and Claude
pnpm test:live

The testing framework runs E2E tests against both ChatGPT and Claude hosts, checks tool annotations through the mcp fixture, and catches rendering issues across display modes and themes. Fix the failures, then submit with confidence.

Get Started

Documentation →


npx sunpeak new

Frequently Asked Questions

What are the most common reasons MCP Apps get rejected from the ChatGPT App Store?

The most common rejection reasons are incorrect or missing tool annotations (readOnlyHint, destructiveHint, openWorldHint), missing or expired test credentials, overly broad data collection in tool inputs, misleading tool names or descriptions, missing privacy policies, selling digital products, submitting incomplete apps, fair play violations, and unauthorized third-party integrations. Tool annotation errors are the single most common cause of rejection on both ChatGPT and Claude.

What tool annotations does my MCP App need before submission?

Every tool must include readOnlyHint, destructiveHint, and openWorldHint annotations. Set readOnlyHint to true for tools that only read data. Set destructiveHint to true for tools that create, update, delete, or send data. Set openWorldHint to true for tools that interact with external systems, public platforms, or create publicly visible content. Both ChatGPT and Claude reject submissions with missing or incorrect annotations.

How do I test my MCP App before submitting to the ChatGPT App Store?

Validate tool annotations match actual behavior, create a demo account with disabled 2FA and sample data, audit tool inputs for unnecessary data collection, test all display modes (inline, pip, fullscreen), verify error handling with clear messages, check CSP configuration, test on mobile, and run your full test suite. Use sunpeak to test across both ChatGPT and Claude hosts locally without paid accounts.

Do I need to test my MCP App on mobile before submitting to ChatGPT?

Yes. OpenAI requires that all test cases pass on both ChatGPT web and mobile. Mobile rendering differences include smaller viewport widths, touch-based interactions instead of hover states, and different safe area constraints. Test your MCP App at mobile viewport sizes and verify that layouts, buttons, and interactive elements work with touch input.

What privacy requirements must my MCP App meet for submission?

Both ChatGPT and Claude require a published privacy policy covering data categories collected, purposes, recipient categories, and user controls. Tool inputs must request the minimum data necessary for the task. Tool responses must not include diagnostic data, telemetry, internal identifiers, or session metadata. You cannot collect payment card data, protected health information, government identifiers, or authentication secrets through tool inputs.

How do I test tool annotations for my MCP App?

Write integration tests that verify each tool has the correct annotations. Use the mcp fixture from sunpeak/test to call mcp.listTools() and assert that every tool includes readOnlyHint, destructiveHint, or openWorldHint with values matching the tool actual behavior. A tool that deletes data must have destructiveHint set to true, not readOnlyHint.

What is the difference between ChatGPT App Store and Claude Connectors Directory submission requirements?

Both platforms require tool annotations, privacy policies, and working test credentials. ChatGPT additionally requires openWorldHint annotations, prohibits digital product sales and advertising, requires testing on web and mobile, and reviews CSP configuration including frameDomains. Claude requires Streamable HTTP transport, both claude.ai and claude.com OAuth callback URLs, tool results under 25,000 tokens, and tool handlers completing within 5 minutes. Both platforms reject apps with missing or incorrect tool annotations.

Can I test my MCP App for both ChatGPT and Claude submission at the same time?

Yes. Use sunpeak to run your test suite against both ChatGPT and Claude host replicas locally. The sunpeak testing framework runs every E2E test against both hosts automatically, so you catch platform-specific rendering bugs, annotation issues, and display mode problems in a single test run. This covers the cross-host testing both platforms expect before submission.