Skip to main content

Overview

Sampling is a powerful MCP feature that allows servers to request LLM completions through the client, enabling sophisticated agentic behaviors while maintaining security and privacy. The right sampling configuration can dramatically improve response quality and performance.

How sampling works in MCP

1. Server sends a sampling/createMessage request to the client
2. Client reviews the request (and can modify it)
3. Client samples from an LLM
4. Client reviews the completion
5. Client returns the result to the server
This human-in-the-loop design ensures users maintain control over what the LLM sees and generates.

Sampling parameters

ParameterDescriptionTypical range
temperatureControls randomness in token selection0.0 – 1.0
maxTokensMaximum number of tokens to generateInteger
stopSequencesCustom sequences that stop generationArray of strings
metadataAdditional provider-specific parametersJSON object
Common extension parameters via metadata:
ParameterDescriptionTypical range
top_pNucleus sampling — limits to top cumulative probability0.0 – 1.0
top_kLimits token selection to top K options1 – 100
presence_penaltyPenalizes tokens present in the text so far–2.0 – 2.0
frequency_penaltyPenalizes tokens based on frequency–2.0 – 2.0
seedFixed random seed for reproducible resultsInteger

Request and response format

{
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What files are in the current directory?"
        }
      }
    ],
    "systemPrompt": "You are a helpful file system assistant.",
    "includeContext": "thisServer",
    "maxTokens": 100,
    "temperature": 0.7
  }
}
The client returns:
{
  "model": "string",
  "stopReason": "endTurn",
  "role": "assistant",
  "content": {
    "type": "text",
    "text": "string"
  }
}

Configuring sampling parameters

public class SamplingExample
{
    public async Task RunWithSamplingAsync()
    {
        var client = new McpClient("https://mcp-server-url.com");

        var request = new McpRequest
        {
            Prompt = "Generate creative ideas for a mobile app",
            SamplingParameters = new SamplingParameters
            {
                Temperature      = 0.8f,   // Higher = more creative
                TopP             = 0.95f,  // Nucleus sampling
                TopK             = 40,     // Limit token selection
                FrequencyPenalty = 0.5f,   // Reduce repetition
                PresencePenalty  = 0.2f    // Encourage diversity
            },
            AllowedTools = new[] { "ideaGenerator", "marketAnalyzer" }
        };

        var response = await client.SendRequestAsync(request);
        Console.WriteLine(response.GeneratedText);
    }
}

Deterministic sampling

Use a fixed seed and temperature=0 to produce reproducible, identical outputs for the same input.
// Java: Deterministic responses with fixed seed
public class DeterministicSamplingExample {
    public void demonstrateDeterministicResponses() {
        McpClient client = new McpClient.Builder()
            .setServerUrl("https://mcp-server-example.com")
            .build();

        long fixedSeed = 12345;

        McpRequest request1 = new McpRequest.Builder()
            .setPrompt("Generate a random number between 1 and 100")
            .setSeed(fixedSeed)
            .setTemperature(0.0)   // Zero temperature = maximum determinism
            .build();

        McpRequest request2 = new McpRequest.Builder()
            .setPrompt("Generate a random number between 1 and 100")
            .setSeed(fixedSeed)
            .setTemperature(0.0)
            .build();

        McpResponse response1 = client.sendRequest(request1);
        McpResponse response2 = client.sendRequest(request2);

        System.out.println("Are responses identical: " +
            response1.getGeneratedText().equals(response2.getGeneratedText()));
        // Output: true
    }
}

Dynamic sampling configuration

Adapt sampling parameters based on task type and user preferences for optimal results.
# Python: Dynamic sampling based on request context
class DynamicSamplingService:
    def __init__(self, mcp_client):
        self.client = mcp_client

    async def generate_with_adaptive_sampling(
        self, prompt, task_type, user_preferences=None
    ):
        sampling_presets = {
            "creative":   {"temperature": 0.9, "top_p": 0.95, "frequency_penalty": 0.7},
            "factual":    {"temperature": 0.2, "top_p": 0.85, "frequency_penalty": 0.2},
            "code":       {"temperature": 0.3, "top_p": 0.90, "frequency_penalty": 0.5},
            "analytical": {"temperature": 0.4, "top_p": 0.92, "frequency_penalty": 0.3}
        }

        sampling_params = sampling_presets.get(task_type, sampling_presets["factual"])

        if user_preferences:
            if "creativity_level" in user_preferences:
                creativity = min(max(user_preferences["creativity_level"], 1), 10) / 10
                sampling_params["temperature"] = 0.1 + (0.9 * creativity)

            if "diversity" in user_preferences:
                diversity = min(max(user_preferences["diversity"], 1), 10) / 10
                sampling_params["top_p"] = 0.6 + (0.39 * diversity)

        response = await self.client.send_request(
            prompt=prompt,
            temperature=sampling_params["temperature"],
            top_p=sampling_params["top_p"],
            frequency_penalty=sampling_params["frequency_penalty"]
        )

        return {
            "text":             response.generated_text,
            "applied_sampling": sampling_params,
            "task_type":        task_type
        }

Sampling presets reference

Task typeTemperatureTop-PFrequency penaltyBest for
Creative0.85–0.90.940.7Stories, brainstorming, art
Factual0.20.850.2Q&A, summaries, explanations
Code0.25–0.30.900.4–0.5Code generation, debugging
Conversational0.70.900.6Chatbots, support
Analytical0.40.920.3Data analysis, reports

Security considerations

Never pass user-supplied sampling parameters directly to the model without validation. A malicious user could set extreme values (e.g., temperature=100) to degrade model behavior.

Validate all parameters

Clamp temperature to [0, 1] and top_p to [0, 1] before sending to the model

Implement rate limits

Prevent abuse by limiting the number of sampling requests per user per minute

Monitor usage

Track sampling requests for unusual patterns that could indicate misuse

Control cost exposure

Set sensible maxTokens limits to prevent unexpectedly large completions

Build docs developers (and LLMs) love