Performance Optimization - Grok Search MCP Server

Performance Overview

The Grok Search MCP Server is optimized for both speed and accuracy. Understanding performance characteristics helps you configure optimal search parameters.

Response Times

Basic Mode

Average: 3-8 seconds Range: 2-15 seconds Factors affecting speed:

Query complexity
Number of results requested
Network latency
API load

index.js:262

max_tokens: analysisMode === "comprehensive" ? 4000 : 2000

Comprehensive Mode

Average: 20-40 seconds Range: 15-60 seconds Factors affecting speed:

All basic mode factors
Analysis depth requirements
Timeline generation
Quote extraction
Multiple perspective analysis

Use basic mode for quick results and comprehensive mode only when detailed analysis is needed.

Caching Strategy

The server implements intelligent caching for expensive operations.

Cache Configuration

index.js:36-70

class SearchCache {
  constructor(maxSize = 100, ttlMinutes = 30) {
    this.cache = new Map();
    this.maxSize = maxSize;
    this.ttl = ttlMinutes * 60 * 1000;
  }

  get(key) {
    const item = this.cache.get(key);
    if (!item) return null;
    
    if (Date.now() - item.timestamp > this.ttl) {
      this.cache.delete(key);
      return null;
    }
    
    return item.data;
  }

  set(key, data) {
    if (this.cache.size >= this.maxSize) {
      const firstKey = this.cache.keys().next().value;
      this.cache.delete(firstKey);
    }
    
    this.cache.set(key, {
      data,
      timestamp: Date.now()
    });
  }
}

Cache Settings:

Size: 100 items maximum (LRU eviction)
TTL: 30 minutes
Eviction: Least Recently Used (LRU)

What Gets Cached

Cached:

Comprehensive analysis results only
Complete query + parameters as cache key

Not Cached:

Basic search results (always fresh)
Health check responses
Error responses

Cache Key Composition

index.js:211

const cacheKey = `${sanitizedQuery}:${searchType}:${maxResults}:${JSON.stringify(handles)}:${fromDate}:${toDate}:comprehensive`;

Cache key includes:

Sanitized query text
Search type (web/news/twitter)
Max results parameter
Twitter handles (if applicable)
Date range (from_date and to_date)
Analysis mode (only “comprehensive”)

Changing any parameter creates a new cache entry. Queries with different dates are cached separately.

Cache Performance

Cache Hit:

index.js:212-216

const cached = this.cache.get(cacheKey);
if (cached) {
  Logger.debug("Cache hit for comprehensive analysis", { query: sanitizedQuery });
  return cached;
}

Performance gain: ~100x faster (instant vs 20-40 seconds) Monitor cache:

{
  "tool": "health_check",
  "parameters": {}
}

Response includes:

{
  "api_details": {
    "cacheSize": 12  // Number of cached items
  }
}

Request Optimization

Query Optimization

Optimal Query Length

Recommended: 2-10 words Maximum: 1000 characters

index.js:205-207

if (sanitizedQuery.length > 1000) {
  throw new Error("Search query too long (max 1000 characters)");
}

Good queries:

“AI developments 2025”
“climate change policy updates”
“SpaceX Starship launch”

Poor queries:

“a” (too short, vague)
“tell me everything about…” (unnecessary words)
500-word detailed description (too long)

Query Sanitization

The server automatically sanitizes queries:

index.js:200

const sanitizedQuery = query.trim().replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, '');

Removes:

Leading/trailing whitespace
Control characters
Null bytes

Parameter Optimization

max_results

Default: 10 Range: 1-20 Performance impact:

5 results: Fastest
10 results: Balanced (recommended)
20 results: Slower, more comprehensive

// Fast, focused search
{
  "query": "specific topic",
  "max_results": 5
}

// Comprehensive coverage
{
  "query": "broad topic",
  "max_results": 15
}

For most use cases, 5-10 results provide optimal balance between speed and coverage.

analysis_mode

Basic: Fast, simple results Comprehensive: Slow, detailed analysis Token allocation:

index.js:262

max_tokens: analysisMode === "comprehensive" ? 4000 : 2000

Performance comparison:

Basic: 2000 tokens, 3-8 seconds
Comprehensive: 4000 tokens, 20-40 seconds

Choose basic when:

You need quick results
Simple information is sufficient
Making multiple searches
Testing or exploration

Choose comprehensive when:

You need deep analysis
Timeline is important
Multiple perspectives matter
Direct quotes needed
Historical context required

Date Filtering

Performance impact: Minimal to moderate Faster:

{
  "query": "recent news",
  "from_date": "2025-06-01",  // Narrow range
  "to_date": "2025-06-07"
}

Slower:

{
  "query": "historical data",
  "from_date": "2020-01-01",  // Broad range
  "to_date": "2025-06-04"
}

Narrower date ranges typically yield faster results since less content needs to be searched and analyzed.

Timeout Configuration

Configure timeouts based on your usage patterns:

index.js:102,131-132

this.requestTimeout = parseInt(process.env.GROK_TIMEOUT || '30000');

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), this.requestTimeout);

Recommended Timeouts

Basic mode only:

{"GROK_TIMEOUT": "30000"}  // 30 seconds

Comprehensive mode only:

{"GROK_TIMEOUT": "60000"}  // 60 seconds

Mixed usage (recommended):

{"GROK_TIMEOUT": "45000"}  // 45 seconds

Development/testing:

{"GROK_TIMEOUT": "20000"}  // 20 seconds (fail fast)

Setting timeout too low causes legitimate comprehensive requests to fail. Setting too high delays error detection.

Retry Logic Performance

The server implements exponential backoff for retries:

index.js:153

const backoffDelay = Math.min(1000 * Math.pow(2, retryCount), 10000);

Retry delays:

Retry 1: 1 second (2^0 * 1000)
Retry 2: 2 seconds (2^1 * 1000)
Retry 3: 4 seconds (2^2 * 1000)
Retry 4: 8 seconds (2^3 * 1000)
Retry 5+: 10 seconds (capped)

Total time with 3 retries:

Original request: 0-30 seconds
Retry 1: +1 second + 0-30 seconds
Retry 2: +2 seconds + 0-30 seconds
Retry 3: +4 seconds + 0-30 seconds
Maximum: ~97 seconds

Retry Configuration

Fast fail (development):

{
  "GROK_MAX_RETRIES": "1",
  "GROK_TIMEOUT": "20000"
}

Balanced (recommended):

{
  "GROK_MAX_RETRIES": "3",
  "GROK_TIMEOUT": "45000"
}

Aggressive (production, high reliability):

{
  "GROK_MAX_RETRIES": "5",
  "GROK_TIMEOUT": "60000"
}

Network Performance

Request Monitoring

The server logs request duration:

index.js:128,145,168

const startTime = Date.now();
// ... make request ...
const duration = Date.now() - startTime;
Logger.debug(`API request successful`, { duration, endpoint });

Parallel Requests

The server handles one request at a time per instance. To make parallel searches, use multiple MCP server instances in your configuration.

Multiple server configuration:

{
  "mcpServers": {
    "grok-search-1": {
      "command": "npx",
      "args": ["grok-search-mcp"],
      "env": {"XAI_API_KEY": "your-key"}
    },
    "grok-search-2": {
      "command": "npx",
      "args": ["grok-search-mcp"],
      "env": {"XAI_API_KEY": "your-key"}
    }
  }
}

Memory Usage

Cache Memory

Per cached item: ~5-50 KB (depending on comprehensiveness) Maximum cache: 100 items Total cache memory: ~0.5-5 MB

Server Memory

Baseline: ~50-100 MB With cache: ~55-105 MB Per request: +10-20 MB (temporary)

Memory usage is minimal and suitable for long-running server instances.

Best Practices

1. Choose Appropriate Analysis Mode

// Fast exploration
basicMode = {
  analysis_mode: "basic",
  max_results: 5
}

// Deep research
comprehensiveMode = {
  analysis_mode: "comprehensive",
  max_results: 10
}

2. Use Cache Effectively

Leverage cache for repeated queries:

// First request: 30 seconds
{
  "query": "AI developments",
  "analysis_mode": "comprehensive"
}

// Second request (within 30 min): instant
{
  "query": "AI developments",
  "analysis_mode": "comprehensive"
}

Cache invalidation: Wait 30 minutes for fresh results

3. Optimize Query Construction

Good:

{
  "query": "SpaceX Starship test flight",
  "search_type": "news",
  "from_date": "2025-06-01"
}

Bad:

{
  "query": "Can you tell me everything about all SpaceX activities including Starship, Falcon 9, Dragon, and any other programs they might have been working on recently or in the past?",
  "search_type": "general"
}

4. Implement Progressive Enhancement

Start with fast basic search, then request comprehensive analysis if needed:

// Step 1: Fast initial search
const basicResults = await search({
  query: "topic",
  analysis_mode: "basic",
  max_results: 5
});

// Step 2: User wants more detail
if (userRequestsDetail) {
  const comprehensiveResults = await search({
    query: "topic",
    analysis_mode: "comprehensive",
    max_results: 10
  });
}

5. Use Appropriate Search Types

Performance ranking (fastest to slowest):

web - Single source type
news - Two source types (news + web)
twitter - Single source, but may need filtering
general - All sources (web + news + twitter)

index.js:285-302

getSearchSources(searchType, handles = null) {
  switch (searchType) {
    case "web":
      return [{"type": "web"}];
    case "news":
      return [{"type": "news"}, {"type": "web"}];
    case "twitter":
    case "x":
      const xSource = {"type": "x"};
      if (handles && handles.length > 0) {
        xSource.x_handles = handles;
      }
      return [xSource];
    case "general":
    default:
      return [{"type": "web"}, {"type": "news"}, {"type": "x"}];
  }
}

6. Monitor Performance

Regularly check health metrics:

{
  "tool": "health_check",
  "parameters": {}
}

Key metrics:

success_rate: Should be >90%
error_count: Monitor for spikes
cacheSize: Indicates cache utilization
uptime_ms: Server stability

7. Handle Timeouts Gracefully

try {
  const results = await search(params);
  return results;
} catch (error) {
  if (error.includes('timeout')) {
    // Retry with basic mode
    return await search({
      ...params,
      analysis_mode: 'basic',
      max_results: 5
    });
  }
  throw error;
}

Performance Metrics

Expected Performance Ranges

Basic Web Search:

Min: 2 seconds
Typical: 5 seconds
Max: 15 seconds
Timeout: 30 seconds

Comprehensive Analysis:

Min: 15 seconds
Typical: 30 seconds
Max: 50 seconds
Timeout: 60 seconds (recommended)

Cache Hit:

Response: Under 100ms (instant)

Performance Monitoring

index.js:894-915

handleHealthCheck() {
  const apiHealth = this.grokAPI.checkHealth();
  const uptime = Date.now() - this.startTime;
  const successRate = this.requestCount > 0 ? 
    (((this.requestCount - this.errorCount) / this.requestCount) * 100).toFixed(2) + "%" : 
    "N/A";

  return {
    server_healthy: true,
    api_healthy: apiHealth.healthy,
    uptime_ms: uptime,
    total_requests: this.requestCount,
    error_count: this.errorCount,
    success_rate: successRate,
    api_details: {
      hasApiKey: apiHealth.hasApiKey,
      cacheSize: apiHealth.cacheSize,
      lastError: apiHealth.lastError
    }
  };
}

Interpret metrics:

Success rate 95-100%: Excellent
Success rate 85-95%: Good (some retries working)
Success rate 70-85%: Fair (check configuration)
Success rate under 70%: Poor (investigate errors)

Troubleshooting Performance Issues

Slow Responses

Diagnosis:

Check analysis mode (comprehensive is slower)
Review max_results (higher is slower)
Check network latency
Review date range (broader is slower)

Solution:

Use basic mode when possible
Reduce max_results to 5-10
Use narrower date ranges
Check network connectivity

High Error Rate

Diagnosis:

{
  "error_count": 25,
  "total_requests": 100,
  "success_rate": "75.00%"
}

Solution:

Increase timeout: GROK_TIMEOUT: 60000
Increase retries: GROK_MAX_RETRIES: 5
Check API key validity
Review error patterns in logs

Cache Not Helping

Diagnosis:

cacheSize: 0 or very low
Frequently changing parameters

Solution:

Reuse same queries for comprehensive analysis
Avoid randomizing parameters
Use consistent date ranges
Cache only works for comprehensive mode

Advanced Optimization

JSON Parsing Strategies

The server uses multiple parsing strategies for resilience:

index.js:438-469

const jsonParsingStrategies = [
  // Strategy 1: Find complete JSON object
  () => {
    const jsonMatch = content.match(/\{[\s\S]*\}/);
    return jsonMatch ? JSON.parse(jsonMatch[0]) : null;
  },
  // Strategy 2: Find JSON between code blocks
  () => {
    const codeBlockMatch = content.match(/```json\s*([\s\S]*?)\s*```/);
    return codeBlockMatch ? JSON.parse(codeBlockMatch[1]) : null;
  },
  // Strategy 3: Find JSON after specific markers
  () => {
    const markerMatch = content.match(/(?:json|JSON|response):\s*(\{[\s\S]*\})/);
    return markerMatch ? JSON.parse(markerMatch[1]) : null;
  },
  // Strategy 4: Clean and try parsing the entire content
  () => {
    const cleaned = content.replace(/^[^{]*/, '').replace(/[^}]*$/, '');
    return cleaned.startsWith('{') ? JSON.parse(cleaned) : null;
  }
];

for (const strategy of jsonParsingStrategies) {
  try {
    parsedResults = strategy();
    if (parsedResults) break;
  } catch (error) {
    continue;
  }
}

Performance impact: Negligible (under 10ms total) Reliability gain: Handles malformed responses gracefully

Input Sanitization

index.js:200

const sanitizedQuery = query.trim().replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, '');

Performance impact: Under 1ms Benefit: Prevents API errors from invalid characters

Summary

For optimal performance:

Use basic mode for most searches
Set max_results to 5-10
Use narrow date ranges when possible
Configure timeouts appropriately: 45-60 seconds
Enable retries: 3-5 attempts
Leverage caching for repeated comprehensive queries
Monitor health metrics regularly
Use specific search types instead of “general”
Optimize queries: 2-10 words, clear and specific
Implement progressive enhancement: basic → comprehensive

Get Started

Core Concepts

Guides

​Performance Overview

​Response Times

​Basic Mode

​Comprehensive Mode

​Caching Strategy

​Cache Configuration

​What Gets Cached

​Cache Key Composition

​Cache Performance

​Request Optimization

​Query Optimization

​Optimal Query Length

​Query Sanitization

​Parameter Optimization

​max_results

​analysis_mode

​Date Filtering

​Timeout Configuration

​Recommended Timeouts

​Retry Logic Performance

​Retry Configuration

​Network Performance

​Request Monitoring

​Parallel Requests

​Memory Usage

​Cache Memory

​Server Memory

​Best Practices

​1. Choose Appropriate Analysis Mode

​2. Use Cache Effectively

​3. Optimize Query Construction

​4. Implement Progressive Enhancement

​5. Use Appropriate Search Types

​6. Monitor Performance

​7. Handle Timeouts Gracefully

​Performance Metrics

​Expected Performance Ranges

​Performance Monitoring

​Troubleshooting Performance Issues

​Slow Responses

​High Error Rate

​Cache Not Helping

​Advanced Optimization

​JSON Parsing Strategies

​Input Sanitization

​Summary

Build docs developers (and LLMs) love

Performance Overview

Response Times

Basic Mode

Comprehensive Mode

Caching Strategy

Cache Configuration

What Gets Cached

Cache Key Composition

Cache Performance

Request Optimization

Query Optimization

Optimal Query Length

Query Sanitization

Parameter Optimization

max_results

analysis_mode

Date Filtering

Timeout Configuration

Recommended Timeouts

Retry Logic Performance

Retry Configuration

Network Performance

Request Monitoring

Parallel Requests

Memory Usage

Cache Memory

Server Memory

Best Practices

1. Choose Appropriate Analysis Mode

2. Use Cache Effectively

3. Optimize Query Construction

4. Implement Progressive Enhancement

5. Use Appropriate Search Types

6. Monitor Performance

7. Handle Timeouts Gracefully

Performance Metrics

Expected Performance Ranges

Performance Monitoring

Troubleshooting Performance Issues

Slow Responses

High Error Rate

Cache Not Helping

Advanced Optimization

JSON Parsing Strategies

Input Sanitization

Summary