Installation
Setup
Set your Fireworks API key as an environment variable:Usage
Streaming
API Reference
ChatFireworks
Model name to use (e.g.,
accounts/fireworks/models/llama-v3p3-70b-instruct).Sampling temperature. Controls randomness in generation.
Maximum number of tokens to generate.
Timeout for requests to Fireworks completion API.
Fireworks API key. Automatically read from
FIREWORKS_API_KEY environment variable if not provided.Base URL path for API requests. Leave blank unless using a proxy or service emulator. Reads from
FIREWORKS_API_BASE if not provided.Whether to stream the results or not.
Number of chat completions to generate for each prompt.
Additional model parameters valid for create call not explicitly specified.
Supported Models
Fireworks hosts a wide variety of open-source models:- Llama 3.3 70B: Meta’s latest high-performance model
- Mixtral MoE 8x22B: Large mixture-of-experts model
- Qwen 2.5: Alibaba’s multilingual models
- DeepSeek: Reasoning and chat models
- Gemma 2: Google’s efficient models
Features
- Fast inference with optimized serving
- Function/tool calling
- JSON mode
- Streaming
- Async support
- Fine-tuning support
- Custom model deployment
Fireworks AI specializes in fast, cost-effective inference for open-source models. They offer competitive pricing and support for custom fine-tuned models.