arielshemesh1999@gmail.com · Israel
← All articles

Perplexity MCP

The official Perplexity server for Claude. Real-time web search plus three Sonar models — sonar-pro for quick answers, sonar-deep-research for long reports, sonar-reasoning-pro for analytic work — all wired into the agent loop with cited results.

What it is

The Perplexity MCP server is Perplexity’s official Model Context Protocol implementation. It lives at perplexityai/modelcontextprotocol and ships on npm as @perplexity-ai/mcp-server. It exposes four tools that map onto Perplexity’s public API: perplexity_search (raw ranked Search API), perplexity_ask (sonar-pro conversation), perplexity_research (sonar-deep-research deep dive), and perplexity_reason (sonar-reasoning-pro analytic reasoning).

What you actually get inside Claude after install: a web that the model can search live, full source URLs returned with every answer (so you can verify or follow the trail), and three quality tiers of response — pick the cheapest model that will plausibly answer the question and escalate when it cannot. Perplexity ships official docs for the integration, and the docs index at https://docs.perplexity.ai/llms.txt is itself designed to be loaded by an LLM agent.

Architecture / How it works

The server is a thin TypeScript wrapper around the Perplexity HTTPS API. It speaks MCP over stdio by default (how Claude Desktop and Claude Code use it) and can also run as an HTTP service for shared/cloud deployments — same tool surface, just exposed at http://host:8080/mcp.

Every tool call hits api.perplexity.ai with the bearer key from PERPLEXITY_API_KEY. The four tools split the surface area by intent:

  • perplexity_search — calls the Search API directly. No LLM generation, just ranked results with title, URL, snippet. Cheapest and fastest. Use when you want the raw hits and intend to read them yourself.
  • perplexity_ask — routes through sonar-pro. Conversational answer with live search and citations. The default day-to-day tool. Good for “what is the latest version of X,” “who funded Y last week,” “summarise the docs for Z.”
  • perplexity_research — routes through sonar-deep-research. Multi-step retrieval, longer compute window, structured report output. Use for “produce a comparative analysis of the four leading vector DBs in 2026 with citations.”
  • perplexity_reason — routes through sonar-reasoning-pro. Designed for analytic chains, math/logic, and step-by-step problem solving. Returns <think>…</think> blocks by default.

Both perplexity_research and perplexity_reason accept an optional strip_thinking: true parameter that removes the <think>…</think> chain-of-thought from the response — the README calls it out specifically as the way to save context tokens when you only need the final answer.

Install

Get a key from the API portal first. Then one command for Claude Code:

claude mcp add perplexity --env PERPLEXITY_API_KEY="your_key_here" \
  -- npx -y @perplexity-ai/mcp-server

Plugin route (Claude Code marketplace):

export PERPLEXITY_API_KEY="your_key_here"
claude
# inside the REPL:
/plugin marketplace add perplexityai/modelcontextprotocol
/plugin install perplexity

Codex CLI works the same way:

codex mcp add perplexity --env PERPLEXITY_API_KEY="your_key_here" \
  -- npx -y @perplexity-ai/mcp-server

Configuration

For Claude Desktop, Cursor, Windsurf, Kiro — the same mcpServers block goes into claude_desktop_config.json (Claude Desktop), ~/.cursor/mcp.json (Cursor), ~/.codeium/windsurf/mcp_config.json (Windsurf), or .kiro/settings/mcp.json (Kiro):

{
  "mcpServers": {
    "perplexity": {
      "command": "npx",
      "args": ["-y", "@perplexity-ai/mcp-server"],
      "env": {
        "PERPLEXITY_API_KEY": "your_key_here"
      }
    }
  }
}

VS Code (.vscode/mcp.json) uses a different wrapper — servers at the top level plus an explicit type:

{
  "servers": {
    "perplexity": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@perplexity-ai/mcp-server"],
      "env": { "PERPLEXITY_API_KEY": "your_key_here" }
    }
  }
}

Useful optional env vars from the README:

  • PERPLEXITY_TIMEOUT_MS=600000 — raise from the 5-minute default for long perplexity_research calls.
  • PERPLEXITY_BASE_URL — point at a custom or enterprise endpoint (default https://api.perplexity.ai).
  • PERPLEXITY_LOG_LEVEL=DEBUG|INFO|WARN|ERROR — default is ERROR.
  • PERPLEXITY_PROXY=https://user:pass@host:8080 — corporate proxy support. Server checks PERPLEXITY_PROXYHTTPS_PROXYHTTP_PROXY in order.

Strict MCP clients sometimes choke on npx’s install chatter on stdout. The fix the README documents is to switch -y for -yq in the args array, which silences the noise.

HTTP / Docker mode for a shared deployment:

docker build -t perplexity-mcp-server .
docker run -p 8080:8080 -e PERPLEXITY_API_KEY=your_key_here perplexity-mcp-server
# server available at http://localhost:8080/mcp

Usage examples

1. Quick fact-check with sonar-pro. The model picks perplexity_ask automatically when I phrase the request as a question:

“What is the current Playwright MCP version on npm? Give me the release date and the changelog headline.”

Claude calls perplexity_ask with that prompt; the response comes back with the answer plus a list of source URLs (Perplexity always returns citations). Cost: one cheap Sonar call, no deep research overhead.

2. Deep research report with sonar-deep-research. When I need a real comparison and not a one-paragraph answer:

“Use perplexity_research: compare Pinecone, Weaviate, Qdrant and pgvector for production RAG in 2026 — ingest throughput, recall on MTEB, pricing at 10M vectors, operational pain points. Cite sources. Set strip_thinking to true.”

The call returns a multi-section report with inline citations. The strip_thinking: true parameter is the difference between a 12k-token response and a 25k-token response — the chain-of-thought traces are kept on Perplexity’s side and only the final report comes back.

3. Analytic reasoning with sonar-reasoning-pro. For problems where the value is in the reasoning, not the retrieval:

“Use perplexity_reason: a Hebrew-RTL landing page using vanilla HTML + GSAP is loading at 4.2s LCP on mobile 3G. Walk through the most likely causes in order, what to measure to confirm each, and the cheapest fix.”

The default response includes the reasoning trace. For a clean answer to paste into a ticket, add strip_thinking: true.

What’s new / version

The current model lineup is the three-tier Sonar stack: sonar-pro, sonar-deep-research, sonar-reasoning-pro. The MCP server itself is on a rolling release on npm under @perplexity-ai/mcp-server. The recent additions worth knowing:

  • strip_thinking parameter on reasoning + research tools. Drops the <think> blocks server-side, big context saving.
  • HTTP server mode with Docker support — lets you run one shared Perplexity bridge for a team behind a CORS allowlist (ALLOWED_ORIGINS).
  • First-class proxy support via PERPLEXITY_PROXY, including the https://user:pass@host:port form — the cleanest of the corporate-proxy options I have seen across MCP servers.
  • One-click install deeplinks for Cursor, VS Code and Kiro, plus a /plugin install perplexity route inside Claude Code.

Why it matters / where I use it

Without Perplexity MCP, “what is the latest version of X” or “what changed in Y this week” sends Claude into hallucination territory the moment the question crosses its training cutoff. With it, every research-flavoured prompt routes through a model that searches first and answers second, and the citations land in the same response — so the verification step is one click, not a separate workflow.

I run it on every project that needs current information: pulling fresh model pricing for cost-routing logic, checking which MCP servers a partner team is actually shipping, comparing accessibility-overlay vendors before a client pitch. The three-model split is the part I leaned on hardest — perplexity_ask handles 80% of calls cheaply, perplexity_research shows up when I need a report, and perplexity_reason with strip_thinking is my go-to for “think this through, give me the answer, no monologue.”

A subtle but important detail in the routing: perplexity_search does not generate text. It hits the Search API directly and comes back with title/URL/snippet triples. When I already know what I am looking for and just want pointers, perplexity_search is dramatically cheaper than asking perplexity_ask to do the same job — no LLM tokens are billed on the answer side, only the search itself. I treat the four tools as a cost ladder: searchaskreasonresearch, and I let Claude pick the lowest rung it can plausibly answer on, with explicit escalation only when the cheaper rung fails.

The HTTP/Docker deployment mode also matters more than it looks. One shared Perplexity bridge behind a CORS allowlist (ALLOWED_ORIGINS) means I can give every project in a workspace the same search surface without copying the API key into a dozen mcp.json files — the key lives once on the server, every agent connects over http://host:8080/mcp. For a small team that is the difference between “everyone runs Perplexity” and “everyone shares one budget and one rotation policy.”

Source

Repo: github.com/perplexityai/modelcontextprotocol. Package: npmjs.com/package/@perplexity-ai/mcp-server. Docs: docs.perplexity.ai → integrations → MCP server. API portal: perplexity.ai/account/api/group.