Skip to main content
vLLM returns different output classes depending on the task: text generation, embeddings, classification, or scoring.

RequestOutput

The output of a text completion request.
class RequestOutput:
    request_id: str
    prompt: str | None
    prompt_token_ids: list[int] | None
    prompt_logprobs: PromptLogprobs | None
    outputs: list[CompletionOutput]
    finished: bool
    metrics: RequestStateStats | None
    num_cached_tokens: int | None

Attributes

request_id
str
Unique identifier for the request.
prompt
str | None
The prompt string of the request.
prompt_token_ids
list[int] | None
Token IDs of the prompt.
prompt_logprobs
list | None
Log probabilities of prompt tokens, if requested.
outputs
list[CompletionOutput]
List of completion outputs. Length equals n from SamplingParams.
finished
bool
Whether the request has finished generating.
metrics
RequestStateStats | None
Performance metrics for the request.
num_cached_tokens
int | None
Number of tokens that hit the prefix cache.

Example

from vllm import LLM, SamplingParams

llm = LLM(model="facebook/opt-125m")
sampling_params = SamplingParams(temperature=0.8, max_tokens=100)

outputs = llm.generate(["Hello, my name is"], sampling_params)

for request_output in outputs:
    print(f"Request ID: {request_output.request_id}")
    print(f"Prompt: {request_output.prompt}")
    print(f"Finished: {request_output.finished}")
    
    for completion_output in request_output.outputs:
        print(f"Generated text: {completion_output.text}")
        print(f"Tokens: {completion_output.token_ids}")

CompletionOutput

A single completion output within a RequestOutput.
class CompletionOutput:
    index: int
    text: str
    token_ids: list[int]
    cumulative_logprob: float | None
    logprobs: SampleLogprobs | None
    finish_reason: str | None
    stop_reason: int | str | None

Attributes

index
int
Index of this output in the request (0 to n-1).
text
str
The generated text.
token_ids
list[int]
Token IDs of the generated text.
cumulative_logprob
float | None
Cumulative log probability of the generated sequence.
logprobs
list | None
Per-token log probabilities, if requested.
finish_reason
str | None
Reason why generation stopped: "stop", "length", or "abort".
stop_reason
int | str | None
The specific stop token or string that caused completion, or None.

PoolingRequestOutput

Base class for pooling model outputs.
class PoolingRequestOutput:
    request_id: str
    outputs: PoolingOutput
    prompt_token_ids: list[int]
    num_cached_tokens: int
    finished: bool

Attributes

request_id
str
Unique identifier for the request.
outputs
PoolingOutput
The pooling output (type varies by task).
prompt_token_ids
list[int]
Token IDs of the input prompt.
num_cached_tokens
int
Number of tokens that hit the prefix cache.
finished
bool
Whether pooling is complete.

EmbeddingRequestOutput

Output for embedding generation tasks.
class EmbeddingRequestOutput(PoolingRequestOutput):
    outputs: EmbeddingOutput

EmbeddingOutput

class EmbeddingOutput:
    embedding: list[float]
embedding
list[float]
The embedding vector. Length depends on model’s hidden dimension.

Example

from vllm import LLM

llm = LLM(model="sentence-transformers/all-MiniLM-L6-v2", runner="pooling")
outputs = llm.embed(["Hello world", "How are you?"])

for output in outputs:
    embedding = output.outputs.embedding
    print(f"Embedding dimension: {len(embedding)}")
    print(f"Embedding vector: {embedding[:5]}...")  # First 5 values

ClassificationRequestOutput

Output for classification tasks.
class ClassificationRequestOutput(PoolingRequestOutput):
    outputs: ClassificationOutput

ClassificationOutput

class ClassificationOutput:
    probs: list[float]
probs
list[float]
Probability distribution over classes. Length equals number of classes.

Example

llm = LLM(model="your-classifier", runner="pooling")
outputs = llm.classify(["This is a great product!"])

for output in outputs:
    probs = output.outputs.probs
    predicted_class = probs.index(max(probs))
    print(f"Predicted class: {predicted_class}")
    print(f"Probabilities: {probs}")

ScoringRequestOutput

Output for scoring/ranking tasks.
class ScoringRequestOutput(PoolingRequestOutput):
    outputs: ScoringOutput

ScoringOutput

class ScoringOutput:
    score: float
score
float
The similarity or relevance score.

Example

llm = LLM(model="your-reranker", runner="pooling")

query = "machine learning"
documents = [
    "ML is a subset of AI",
    "Python is a language",
]

pairs = [f"{query} [SEP] {doc}" for doc in documents]
outputs = llm.score(pairs)

for i, output in enumerate(outputs):
    print(f"Document {i} score: {output.outputs.score}")

Build docs developers (and LLMs) love