Configure MCP sampling parameters to control LLM output quality — covering temperature, top-p, deterministic sampling, and adaptive sampling strategies for different task types.
Sampling is a powerful MCP feature that allows servers to request LLM completions through the client, enabling sophisticated agentic behaviors while maintaining security and privacy. The right sampling configuration can dramatically improve response quality and performance.
1. Server sends a sampling/createMessage request to the client2. Client reviews the request (and can modify it)3. Client samples from an LLM4. Client reviews the completion5. Client returns the result to the server
This human-in-the-loop design ensures users maintain control over what the LLM sees and generates.
Use a fixed seed and temperature=0 to produce reproducible, identical outputs for the same input.
// Java: Deterministic responses with fixed seedpublic class DeterministicSamplingExample { public void demonstrateDeterministicResponses() { McpClient client = new McpClient.Builder() .setServerUrl("https://mcp-server-example.com") .build(); long fixedSeed = 12345; McpRequest request1 = new McpRequest.Builder() .setPrompt("Generate a random number between 1 and 100") .setSeed(fixedSeed) .setTemperature(0.0) // Zero temperature = maximum determinism .build(); McpRequest request2 = new McpRequest.Builder() .setPrompt("Generate a random number between 1 and 100") .setSeed(fixedSeed) .setTemperature(0.0) .build(); McpResponse response1 = client.sendRequest(request1); McpResponse response2 = client.sendRequest(request2); System.out.println("Are responses identical: " + response1.getGeneratedText().equals(response2.getGeneratedText())); // Output: true }}
Never pass user-supplied sampling parameters directly to the model without validation. A malicious user could set extreme values (e.g., temperature=100) to degrade model behavior.
Validate all parameters
Clamp temperature to [0, 1] and top_p to [0, 1] before sending to the model
Implement rate limits
Prevent abuse by limiting the number of sampling requests per user per minute
Monitor usage
Track sampling requests for unusual patterns that could indicate misuse
Control cost exposure
Set sensible maxTokens limits to prevent unexpectedly large completions