Overview
LlamaIndex.TS supports a wide range of LLM and embedding providers through dedicated packages. Each provider implements the common BaseLLM or BaseEmbedding interface, allowing you to switch providers with minimal code changes.
Provider Packages
All providers are published as separate npm packages following the pattern @llamaindex/<provider-name>:
npm install @llamaindex/openai
npm install @llamaindex/anthropic
npm install @llamaindex/google
Installing only the providers you need keeps your bundle size small.
Major LLM Providers
OpenAI
Anthropic
Google Gemini
Ollama
Groq
Mistral
OpenAI OpenAI provides GPT-4 and GPT-3.5 models with excellent performance and tool calling support. Installation: npm install @llamaindex/openai
Environment: export OPENAI_API_KEY = "sk-..."
export OPENAI_BASE_URL = "https://api.openai.com/v1" # Optional
LLM Usage: import { OpenAI } from "@llamaindex/openai" ;
const llm = new OpenAI ({
model: "gpt-4o" , // or gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
temperature: 0.7 ,
maxTokens: 1024 ,
apiKey: process . env . OPENAI_API_KEY , // Optional if env var set
});
const response = await llm . chat ({
messages: [{ role: "user" , content: "Hello!" }],
});
Embedding Usage: import { OpenAIEmbedding } from "@llamaindex/openai" ;
const embedModel = new OpenAIEmbedding ({
model: "text-embedding-3-small" , // or text-embedding-3-large
dimensions: 1536 , // Optional: customize dimensions
});
const embedding = await embedModel . getTextEmbedding ( "Hello world" );
Supported Features:
Function calling / tool use
Streaming responses
Vision (GPT-4 Vision models)
JSON mode / structured output
Multi-modal inputs (images, files)
Popular Models:
gpt-4o: Latest flagship model
gpt-4o-mini: Fast, cost-effective
gpt-4-turbo: Previous generation flagship
gpt-3.5-turbo: Fast and affordable
Anthropic (Claude) Anthropic’s Claude models excel at complex reasoning and long context windows. Installation: npm install @llamaindex/anthropic
Environment: export ANTHROPIC_API_KEY = "sk-ant-..."
LLM Usage: import { Anthropic } from "@llamaindex/anthropic" ;
const llm = new Anthropic ({
model: "claude-3-7-sonnet" ,
temperature: 0.7 ,
maxTokens: 2048 ,
});
const response = await llm . chat ({
messages: [
{ role: "system" , content: "You are helpful." },
{ role: "user" , content: "Explain quantum computing." },
],
});
Streaming: const stream = await llm . chat ({
messages: [{ role: "user" , content: "Write a story" }],
stream: true ,
});
for await ( const chunk of stream ) {
process . stdout . write ( chunk . delta );
}
Supported Features:
Tool calling (Claude 3+ models)
Streaming
Vision (images, PDFs)
Extended thinking blocks
Prompt caching
Long context (200k tokens)
Popular Models:
claude-4-0-sonnet: Latest Claude 4 flagship
claude-3-7-sonnet: Latest Claude 3.7
claude-3-5-sonnet: Fast and intelligent
claude-3-5-haiku: Ultra-fast responses
claude-3-opus: Most capable Claude 3
Google Gemini Google’s Gemini models offer strong multi-modal capabilities and large context windows. Installation: npm install @llamaindex/google
Environment: export GOOGLE_API_KEY = "..."
LLM Usage: import { gemini , GEMINI_MODEL } from "@llamaindex/google" ;
const llm = gemini ({
model: GEMINI_MODEL . GEMINI_2_0_FLASH ,
temperature: 0.7 ,
});
const response = await llm . chat ({
messages: [{ role: "user" , content: "Hello!" }],
});
Vertex AI (GCP): const llm = gemini ({
model: GEMINI_MODEL . GEMINI_2_0_FLASH ,
vertex: {
project: "your-gcp-project" ,
location: "us-central1" ,
},
});
Embedding Usage: import { GeminiEmbedding , GEMINI_EMBEDDING_MODEL } from "@llamaindex/google" ;
const embedModel = new GeminiEmbedding ({
model: GEMINI_EMBEDDING_MODEL . TEXT_EMBEDDING_004 ,
});
Supported Features:
Function calling
Multi-modal (text, images, video, audio)
Large context windows (up to 2M tokens)
Live API for real-time conversations
Streaming
Popular Models:
gemini-2.0-flash: Latest fast model
gemini-1.5-pro: Balanced performance
gemini-1.5-flash: Fast inference
Ollama (Local Models) Run LLMs locally with Ollama - great for privacy and offline use. Installation: npm install @llamaindex/ollama
# Also install Ollama: https://ollama.ai
Setup: # Start Ollama server
ollama serve
# Pull a model
ollama pull llama3.1
LLM Usage: import { Ollama } from "@llamaindex/ollama" ;
const llm = new Ollama ({
model: "llama3.1" ,
config: {
host: "http://localhost:11434" ,
},
options: {
temperature: 0.7 ,
num_ctx: 4096 ,
},
});
const response = await llm . chat ({
messages: [{ role: "user" , content: "Hello!" }],
});
Embedding Usage: import { OllamaEmbedding } from "@llamaindex/ollama" ;
const embedModel = new OllamaEmbedding ({
model: "nomic-embed-text" ,
config: { host: "http://localhost:11434" },
});
Supported Features:
Tool calling
Streaming
Local/offline operation
Custom models
No API costs
Popular Models:
llama3.1: Meta’s latest Llama
mistral: Mistral 7B
codellama: Code generation
nomic-embed-text: Embeddings
Groq Ultra-fast LLM inference with Groq’s custom hardware. Installation: npm install @llamaindex/groq
Environment: export GROQ_API_KEY = "gsk_..."
LLM Usage: import { Groq } from "@llamaindex/groq" ;
const llm = new Groq ({
model: "llama-3.1-70b-versatile" ,
temperature: 0.7 ,
});
const response = await llm . chat ({
messages: [{ role: "user" , content: "Hello!" }],
});
Supported Features:
Extremely fast inference
Tool calling
Streaming
Open-source models
Popular Models:
llama-3.1-70b-versatile
llama-3.1-8b-instant
mixtral-8x7b-32768
gemma-7b-it
Mistral AI High-performance European AI models. Installation: npm install @llamaindex/mistral
Environment: export MISTRAL_API_KEY = "..."
LLM Usage: import { MistralAI } from "@llamaindex/mistral" ;
const llm = new MistralAI ({
model: "mistral-large-latest" ,
temperature: 0.7 ,
});
const response = await llm . chat ({
messages: [{ role: "user" , content: "Hello!" }],
});
Supported Features:
Function calling
Streaming
JSON mode
Embeddings
Popular Models:
mistral-large-latest: Most capable
mistral-small-latest: Fast and efficient
mistral-embed: Embeddings
Additional Providers
Deepseek npm install @llamaindex/deepseek
import { DeepSeek } from "@llamaindex/deepseek" ;
const llm = new DeepSeek ({
model: "deepseek-chat" ,
apiKey: process . env . DEEPSEEK_API_KEY ,
});
Fireworks AI Fast inference for open-source models. npm install @llamaindex/fireworks
import { FireworksLLM } from "@llamaindex/fireworks" ;
const llm = new FireworksLLM ({
model: "accounts/fireworks/models/llama-v3-70b-instruct" ,
apiKey: process . env . FIREWORKS_API_KEY ,
});
Together AI Run open-source models at scale. npm install @llamaindex/together
import { TogetherLLM } from "@llamaindex/together" ;
const llm = new TogetherLLM ({
model: "mistralai/Mixtral-8x7B-Instruct-v0.1" ,
apiKey: process . env . TOGETHER_API_KEY ,
});
Perplexity Online LLMs with real-time web search. npm install @llamaindex/perplexity
import { PerplexityLLM } from "@llamaindex/perplexity" ;
const llm = new PerplexityLLM ({
model: "llama-3.1-sonar-large-128k-online" ,
apiKey: process . env . PERPLEXITY_API_KEY ,
});
Replicate Run open-source models via cloud API. npm install @llamaindex/replicate
import { ReplicateLLM } from "@llamaindex/replicate" ;
const llm = new ReplicateLLM ({
model: "meta/llama-2-70b-chat" ,
apiKey: process . env . REPLICATE_API_KEY ,
});
xAI Grok models from xAI. npm install @llamaindex/xai
import { XAI } from "@llamaindex/xai" ;
const llm = new XAI ({
model: "grok-beta" ,
apiKey: process . env . XAI_API_KEY ,
});
Vercel AI Integration with Vercel AI SDK. npm install @llamaindex/vercel
import { VercelLLM } from "@llamaindex/vercel" ;
import { openai } from "@ai-sdk/openai" ;
const llm = new VercelLLM ({
model: openai ( "gpt-4o" ),
});
AWS Bedrock LLMs through AWS Bedrock. npm install @llamaindex/aws
import { BedrockLLM } from "@llamaindex/aws" ;
const llm = new BedrockLLM ({
model: "anthropic.claude-3-sonnet-20240229-v1:0" ,
region: "us-east-1" ,
});
vLLM Self-hosted high-performance inference. npm install @llamaindex/vllm
import { VLLM } from "@llamaindex/vllm" ;
const llm = new VLLM ({
model: "mistralai/Mistral-7B-Instruct-v0.2" ,
baseURL: "http://localhost:8000/v1" ,
});
Portkey AI LLM gateway with observability and routing. npm install @llamaindex/portkey-ai
import { PortkeyLLM } from "@llamaindex/portkey-ai" ;
const llm = new PortkeyLLM ({
apiKey: process . env . PORTKEY_API_KEY ,
virtualKey: "openai-virtual-key" ,
});
Embedding Providers
OpenAI
Voyage AI
Cohere
HuggingFace
Jina AI
Mixedbread
CLIP
import { OpenAIEmbedding } from "@llamaindex/openai" ;
const embedModel = new OpenAIEmbedding ({
model: "text-embedding-3-small" ,
dimensions: 1536 ,
});
Models:
text-embedding-3-small: 1536 dims (customizable)
text-embedding-3-large: 3072 dims (customizable)
text-embedding-ada-002: 1536 dims (legacy)
import { VoyageAIEmbedding } from "@llamaindex/voyage-ai" ;
const embedModel = new VoyageAIEmbedding ({
model: "voyage-2" ,
apiKey: process . env . VOYAGE_API_KEY ,
});
Domain-optimized embedding models for better accuracy. import { CohereEmbedding } from "@llamaindex/cohere" ;
const embedModel = new CohereEmbedding ({
model: "embed-english-v3.0" ,
apiKey: process . env . COHERE_API_KEY ,
});
Strong multilingual support. import { HuggingFaceEmbedding } from "@llamaindex/huggingface" ;
const embedModel = new HuggingFaceEmbedding ({
modelType: "BAAI/bge-small-en-v1.5" ,
apiKey: process . env . HUGGINGFACE_API_KEY ,
});
Access thousands of open-source embedding models. import { JinaAIEmbedding } from "@llamaindex/jinaai" ;
const embedModel = new JinaAIEmbedding ({
model: "jina-embeddings-v2-base-en" ,
apiKey: process . env . JINAAI_API_KEY ,
});
Optimized for semantic search. import { MixedbreadEmbedding } from "@llamaindex/mixedbread" ;
const embedModel = new MixedbreadEmbedding ({
model: "mixedbread-ai/mxbai-embed-large-v1" ,
apiKey: process . env . MIXEDBREAD_API_KEY ,
});
High-quality multilingual embeddings. import { ClipEmbedding } from "@llamaindex/clip" ;
const embedModel = new ClipEmbedding ();
// Embed images and text in same space
const imageEmb = await embedModel . getImageEmbedding ( imageBuffer );
const textEmb = await embedModel . getTextEmbedding ( "a cat" );
Multi-modal embeddings for images and text.
Provider Comparison
Provider LLM Embeddings Function Calling Vision Local Cost OpenAI ✅ ✅ ✅ ✅ ❌ $$$ Anthropic ✅ ❌ ✅ ✅ ❌ $$$ Google ✅ ✅ ✅ ✅ ❌ $$ Ollama ✅ ✅ ✅ ⚠️ ✅ Free Groq ✅ ❌ ✅ ❌ ❌ $ Mistral ✅ ✅ ✅ ❌ ❌ $$ Voyage AI ❌ ✅ N/A ❌ ❌ $ Cohere ✅ ✅ ✅ ❌ ❌ $$
Switching Providers
Thanks to the unified interface, switching providers is simple:
import { Settings } from "llamaindex" ;
import { OpenAI } from "@llamaindex/openai" ;
import { Anthropic } from "@llamaindex/anthropic" ;
// Use OpenAI
Settings . llm = new OpenAI ({ model: "gpt-4o" });
// Switch to Anthropic
Settings . llm = new Anthropic ({ model: "claude-3-7-sonnet" });
// All your code continues to work!
const index = await VectorStoreIndex . fromDocuments ( docs );
const response = await index . query ({ query: "What is RAG?" });
Best Practices
Use environment variables for API keys
# .env file
OPENAI_API_KEY = sk-...
ANTHROPIC_API_KEY = sk-ant-...
GOOGLE_API_KEY = ...
LlamaIndex.TS automatically detects these standard environment variables.
Install only what you need
# Good: Only install providers you use
npm install @llamaindex/openai @llamaindex/anthropic
# Avoid: Installing everything
npm install @llamaindex/ *
This keeps your node_modules small and deploy times fast.
Consider cost vs performance
Use local models for privacy
For sensitive data, use Ollama or vLLM to run models on your infrastructure: const llm = new Ollama ({ model: "llama3.1" });
// Data never leaves your servers
Test with multiple providers
Different providers excel at different tasks: // Claude for long-form reasoning
const claudeEngine = index . asQueryEngine ({
llm: new Anthropic ({ model: "claude-3-7-sonnet" })
});
// GPT-4 for structured output
const gptEngine = index . asQueryEngine ({
llm: new OpenAI ({ model: "gpt-4o" })
});
Next Steps
LLMs Learn about the LLM interface and capabilities
Embeddings Understand embedding models for semantic search