RequestOutput
TheRequestOutput class represents the output of a completion request to the LLM. It contains the generated text, tokens, metadata, and optional performance metrics.
Class Definition
RequestOutput instances are created internally by the executor. Users should not instantiate this class directly.Attributes
Unique identifier for this generation request.
The original prompt string that was provided to the LLM. May be
None if the request was submitted as token IDs only.Token IDs of the input prompt after tokenization.
List of completion outputs. Length equals
sampling_params.n (number of sequences requested).Each CompletionOutput contains:- Generated text
- Token IDs
- Log probabilities (if requested)
- Finish reason
Logits tensor for prompt tokens. Shape:
[prompt_length, vocab_size].Only available when sampling_params.return_context_logits=True.Parameters for disaggregated serving, including:
- Request type (
context_only,generation_only) - Multimodal embedding handles
- Context request ID
- First generated tokens
Whether the entire request has finished processing.
Methods
result()
Wait for the generation to complete and return the result (for async requests).Parameters
Maximum time to wait in seconds.
None means wait indefinitely.Returns
The completed request output.
Streaming (Iteration)
For streaming requests,RequestOutput can be iterated to get partial results:
CompletionOutput
EachRequestOutput.outputs entry is a CompletionOutput object containing the generated text and metadata.
Attributes
Index of this output in the request (0 to n-1).
The generated text output.
Token IDs of the generated output.
Sum of log probabilities for all generated tokens. Used for ranking multiple outputs.
Log probabilities for generated tokens (if
sampling_params.logprobs was set).Each entry is a dictionary mapping token IDs to Logprob objects with:logprob: Log probability valuerank: Rank among all tokens (1 = most likely)
Log probabilities for prompt tokens (if
sampling_params.prompt_logprobs was set). Same format as logprobs.Reason why generation stopped:
'stop': Hit stop string, stop token, or EOS token'length': Reachedmax_tokenslimit'timeout': Request timed out'cancelled': Request was cancelledNone: Still generating (streaming mode)
The specific stop string or token ID that caused generation to stop. Only set when
finish_reason='stop'.Full logits tensor for generated tokens. Shape:
[num_tokens, vocab_size].Only available when sampling_params.return_generation_logits=True.Additional model outputs from the context (prompt) phase.
Additional model outputs from the generation phase.
Disaggregated serving parameters for this output.
Performance metrics for this request (if
sampling_params.return_perf_metrics=True):- Time to first token (TTFT)
- Time per output token (TPOT)
- End-to-end latency (E2E)
- Queue time
- Throughput
Streaming Properties
Number of tokens generated so far.
Newly generated text since the last update (streaming mode).
Newly generated token IDs since the last update (streaming mode).
Log probabilities for newly generated tokens (streaming mode).
Usage Examples
Basic Output Access
Multiple Outputs
Log Probabilities
Streaming with text_diff
Performance Metrics
Async/Await
See Also
- LLM - Main LLM class
- SamplingParams - Generation parameters
- Tokenizer - Tokenization interface