Text Completion

Overview

Text completion with Llama 2 uses pretrained models to generate natural continuations of prompts. These models are not fine-tuned for chat or Q&A - they should be prompted so that the expected answer is the natural continuation of the prompt.

Basic Usage

First, build a Llama instance and then call text_completion() with your prompts:

from llama import Llama
from typing import List

generator = Llama.build(
    ckpt_dir="llama-2-7b/",
    tokenizer_path="tokenizer.model",
    max_seq_len=128,
    max_batch_size=4,
)

prompts: List[str] = [
    "I believe the meaning of life is",
    "Simply put, the theory of relativity states that ",
]

results = generator.text_completion(
    prompts,
    max_gen_len=64,
    temperature=0.6,
    top_p=0.9,
)

for prompt, result in zip(prompts, results):
    print(prompt)
    print(f"> {result['generation']}")

Parameters

Build Parameters

ckpt_dir

str

required

Path to the directory containing checkpoint files for the pretrained model.

tokenizer_path

str

required

Path to the tokenizer model used for text encoding/decoding.

max_seq_len

int

required

Maximum sequence length for input prompts. Defaults to 128 for text completion. All models support up to 4096 tokens, but cache is pre-allocated based on this value.

max_batch_size

int

required

Maximum batch size for generating sequences. Defaults to 4.

Generation Parameters

prompts

List[str]

required

List of text prompts for completion.

temperature

float

default:"0.6"

Temperature value for controlling randomness in generation. Higher values (e.g., 1.0) make output more random, lower values (e.g., 0.1) make it more deterministic.

top_p

float

default:"0.9"

Top-p probability threshold for nucleus sampling. Controls diversity by sampling from the smallest set of tokens whose cumulative probability exceeds this threshold.

max_gen_len

int

default:"64"

Maximum length of generated sequences. If not provided, it’s set to the model’s maximum sequence length minus 1.

logprobs

bool

default:"false"

Whether to compute and return token log probabilities.

echo

bool

default:"false"

Whether to include prompt tokens in the generated output.

Example Prompts

Natural Continuation

prompts = [
    "I believe the meaning of life is",
    "Simply put, the theory of relativity states that ",
]

Few-Shot Translation

prompts = [
    """Translate English to French:
    
    sea otter => loutre de mer
    peppermint => menthe poivrée
    plush girafe => girafe peluche
    cheese =>""",
]

Message Completion

prompts = [
    """A brief message congratulating the team on the launch:

    Hi everyone,
    
    I just """,
]

Response Format

The text_completion() method returns a list of CompletionPrediction dictionaries:

[
    {
        "generation": str,  # The generated text
        "tokens": List[str],  # Optional: decoded tokens (if logprobs=True)
        "logprobs": List[float]  # Optional: log probabilities (if logprobs=True)
    }
]

Running from Command Line

Run the example script with the appropriate model parallel value:

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

See Model Parallel Configuration for the correct nproc_per_node value for your model size.

Get Started

Model Usage

Core Concepts

Model Variants

Text Completion

Overview

Basic Usage

Parameters

Build Parameters

Generation Parameters

Example Prompts

Natural Continuation

Few-Shot Translation

Message Completion

Response Format

Running from Command Line

Build docs developers (and LLMs) love

Get Started

Model Usage

Core Concepts

Model Variants

​Overview

​Basic Usage

​Parameters

​Build Parameters

​Generation Parameters

​Example Prompts

​Natural Continuation

​Few-Shot Translation

​Message Completion

​Response Format

​Running from Command Line

Build docs developers (and LLMs) love

Overview

Basic Usage

Parameters

Build Parameters

Generation Parameters

Example Prompts

Natural Continuation

Few-Shot Translation

Message Completion

Response Format

Running from Command Line