Skip to main content

What is MCP?

The Model Context Protocol (MCP) is a standardized protocol that enables Large Language Models (LLMs) to interact with external systems through a structured client-server architecture. MCP provides a consistent way for AI agents to access tools, resources, and capabilities beyond their training data. Playwright MCP is a server implementation that exposes browser automation capabilities through the MCP protocol, allowing LLMs to interact with web pages using structured data rather than visual interpretation.

Client-Server Architecture

Playwright MCP follows the standard MCP architecture pattern:

MCP Client

The AI agent or IDE that sends requests to the server. Examples include Claude Desktop, VS Code, Cursor, and other MCP-compatible tools.

MCP Server

The Playwright MCP server that receives requests, executes browser automation, and returns structured responses.

Communication Flow

Transport Protocols

Playwright MCP supports multiple transport mechanisms: STDIO Transport (Default) The most common mode, where the server communicates with the client through standard input/output streams:
{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}
SSE Transport (HTTP Mode) For remote deployments or when running the server separately:
npx @playwright/mcp@latest --port 8931
{
  "mcpServers": {
    "playwright": {
      "url": "http://localhost:8931/mcp"
    }
  }
}
SSE (Server-Sent Events) transport is useful when running headed browsers on systems without displays or when the IDE worker process can’t access the display.

Tool Capabilities System

Playwright MCP uses a capability-based system to control which tools are available. This allows you to enable only the features you need:

Core Capabilities

Core (Default)
  • Page navigation and snapshots
  • Element interactions (click, type, hover)
  • Form handling
  • Dialog handling
  • Keyboard and mouse actions
PDF
  • PDF generation from web pages
  • Requires --caps=pdf flag
Vision
  • Coordinate-based interactions
  • XY position clicking and dragging
  • Requires --caps=vision flag
DevTools
  • Browser developer tools access
  • Advanced debugging capabilities
  • Requires --caps=devtools flag

Enabling Capabilities

Capabilities can be enabled via command-line arguments:
{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": [
        "@playwright/mcp@latest",
        "--caps=vision,pdf"
      ]
    }
  }
}
Or through configuration:
{
  "capabilities": ["core", "pdf", "vision"]
}
By default, only core capabilities are enabled. This keeps the tool list minimal and reduces context window usage.

Configuration Options

Playwright MCP can be configured through:
  1. Command-line arguments - Quick configuration via --option=value
  2. Environment variables - System-level settings via PLAYWRIGHT_MCP_*
  3. Configuration file - Comprehensive JSON configuration via --config

Configuration File

For complex setups, use a JSON configuration file:
{
  browser?: {
    browserName?: 'chromium' | 'firefox' | 'webkit';
    isolated?: boolean;
    userDataDir?: string;
    launchOptions?: playwright.LaunchOptions;
    contextOptions?: playwright.BrowserContextOptions;
    cdpEndpoint?: string;
    initPage?: string[];
    initScript?: string[];
  },
  
  server?: {
    port?: number;
    host?: string;
    allowedHosts?: string[];
  },
  
  capabilities?: ToolCapability[];
  
  timeouts?: {
    action?: number;  // Default: 5000ms
    navigation?: number;  // Default: 60000ms
  };
  
  snapshot?: {
    mode?: 'incremental' | 'full' | 'none';
  };
}
Load your configuration:
npx @playwright/mcp@latest --config path/to/config.json

Playwright MCP vs Playwright CLI

Playwright offers two approaches for AI-driven automation:

Playwright CLI + SKILLS

Best for: Coding agents working with large codebases
  • More token-efficient
  • Concise, purpose-built commands
  • Avoids loading large tool schemas
  • Better for high-throughput agents

Playwright MCP

Best for: Exploratory automation and persistent workflows
  • Rich page introspection
  • Persistent browser state
  • Structured accessibility data
  • Long-running autonomous workflows
Choose CLI+SKILLS for coding agents that need to balance browser automation with code editing. Choose MCP for agents focused primarily on web interaction and exploration.

Key Features

Accessibility-First Approach

Playwright MCP uses the browser’s accessibility tree instead of screenshots:
  • Faster - No image processing required
  • Smaller - Structured text data vs large image files
  • Deterministic - Precise element references vs coordinate guessing
  • LLM-friendly - No vision models needed

Session Management

The server maintains browser state across multiple tool calls:
  • Cookies and local storage persist
  • Login sessions remain active
  • Tab state preserved between operations
  • Optional session recording and trace saving

Security Boundaries

Playwright MCP is not a security boundary. Follow MCP Security Best Practices for production deployments.
Security features include:
  • Origin allowlists and blocklists
  • File access restrictions
  • Service worker blocking
  • Secrets management

Next Steps

Browser Automation

Learn how Playwright manages browser contexts and sessions

Accessibility Snapshots

Understand the structured snapshot format