Overview
Text completion with Llama 2 uses pretrained models to generate natural continuations of prompts. These models are not fine-tuned for chat or Q&A - they should be prompted so that the expected answer is the natural continuation of the prompt.Basic Usage
First, build a Llama instance and then calltext_completion() with your prompts:
Parameters
Build Parameters
Path to the directory containing checkpoint files for the pretrained model.
Path to the tokenizer model used for text encoding/decoding.
Maximum sequence length for input prompts. Defaults to 128 for text completion.
All models support up to 4096 tokens, but cache is pre-allocated based on this value.
Maximum batch size for generating sequences. Defaults to 4.
Generation Parameters
List of text prompts for completion.
Temperature value for controlling randomness in generation. Higher values (e.g., 1.0) make output more random, lower values (e.g., 0.1) make it more deterministic.
Top-p probability threshold for nucleus sampling. Controls diversity by sampling from the smallest set of tokens whose cumulative probability exceeds this threshold.
Maximum length of generated sequences. If not provided, it’s set to the model’s maximum sequence length minus 1.
Whether to compute and return token log probabilities.
Whether to include prompt tokens in the generated output.
Example Prompts
Natural Continuation
Few-Shot Translation
Message Completion
Response Format
Thetext_completion() method returns a list of CompletionPrediction dictionaries:
Running from Command Line
Run the example script with the appropriate model parallel value:nproc_per_node value for your model size.