Together AI

Overview

Together AI provides fast inference for open-source LLMs and embedding models. The provider extends OpenAI’s interface with Together AI’s API endpoints.

Installation

npm install @llamaindex/together

Basic Usage

LLM

import { TogetherLLM } from "@llamaindex/together";

const llm = new TogetherLLM({
  model: "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
  apiKey: process.env.TOGETHER_API_KEY
});

const response = await llm.chat({
  messages: [
    { role: "user", content: "Explain machine learning" }
  ]
});

console.log(response.message.content);

Embeddings

import { TogetherEmbedding } from "@llamaindex/together";

const embedModel = new TogetherEmbedding({
  model: "togethercomputer/m2-bert-80M-32k-retrieval",
  apiKey: process.env.TOGETHER_API_KEY
});

const embedding = await embedModel.getTextEmbedding(
  "LlamaIndex is a data framework for LLM applications"
);

Constructor Options

TogetherLLM

model

string

default:"togethercomputer/llama-2-7b-chat"

Together AI model name

apiKey

string

Together AI API key (defaults to TOGETHER_API_KEY env variable)

temperature

number

Sampling temperature

maxTokens

number

Maximum tokens in response

topP

number

Nucleus sampling parameter

additionalSessionOptions

object

Additional OpenAI client options (e.g., custom baseURL)

TogetherEmbedding

model

string

default:"togethercomputer/m2-bert-80M-32k-retrieval"

Together AI embedding model name

apiKey

string

Together AI API key (defaults to TOGETHER_API_KEY env variable)

additionalSessionOptions

object

Additional OpenAI client options

Supported Models

Chat Models

Llama 3.1

meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo: 405B, most capable
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo: 70B, balanced
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo: 8B, fast

Llama 3

meta-llama/Meta-Llama-3-70B-Instruct-Turbo
meta-llama/Meta-Llama-3-8B-Instruct-Turbo

Llama 2

togethercomputer/llama-2-7b-chat: Default model
togethercomputer/llama-2-13b-chat
togethercomputer/llama-2-70b-chat

Mixtral

mistralai/Mixtral-8x7B-Instruct-v0.1
mistralai/Mixtral-8x22B-Instruct-v0.1

Qwen

Qwen/Qwen2.5-72B-Instruct-Turbo
Qwen/Qwen2.5-7B-Instruct-Turbo

Embedding Models

togethercomputer/m2-bert-80M-32k-retrieval: Default, 32K context
togethercomputer/m2-bert-80M-8k-retrieval: 8K context
WhereIsAI/UAE-Large-V1: 512 dimensions
BAAI/bge-large-en-v1.5: BGE large English

Streaming

const stream = await llm.chat({
  messages: [{ role: "user", content: "Write a story about AI" }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);
}

Function Calling

Together AI supports function calling on compatible models:

import { tool } from "@llamaindex/core/tools";
import { z } from "zod";

const weatherTool = tool({
  name: "get_weather",
  description: "Get current weather",
  parameters: z.object({
    location: z.string(),
    units: z.enum(["celsius", "fahrenheit"]).optional()
  }),
  execute: async ({ location, units = "celsius" }) => {
    return `Weather in ${location}: 22°${units === "celsius" ? "C" : "F"}`;
  }
});

const llm = new TogetherLLM({
  model: "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
});

const response = await llm.chat({
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: [weatherTool]
});

With LlamaIndex

import { Settings, VectorStoreIndex } from "llamaindex";
import { TogetherLLM, TogetherEmbedding } from "@llamaindex/together";

Settings.llm = new TogetherLLM({
  model: "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
});

Settings.embedModel = new TogetherEmbedding({
  model: "togethercomputer/m2-bert-80M-32k-retrieval"
});

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
  query: "What are the key features?"
});

Convenience Functions

import { together } from "@llamaindex/together";

// Quick LLM instance
const llm = together({
  model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
});

Configuration

Environment Variables

TOGETHER_API_KEY=your-api-key-here

Custom Base URL

const llm = new TogetherLLM({
  additionalSessionOptions: {
    baseURL: "https://custom-together-endpoint.com/v1"
  }
});

Default base URL: https://api.together.xyz/v1

Model Selection Guide

Use Case	Recommended Model	Why
Complex reasoning	Meta-Llama-3.1-405B-Instruct-Turbo	Best quality
General purpose	Meta-Llama-3.1-70B-Instruct-Turbo	Balanced
Speed critical	Meta-Llama-3.1-8B-Instruct-Turbo	Fastest
Long context	togethercomputer/m2-bert-80M-32k-retrieval	32K embeddings

Performance

Together AI offers competitive inference speeds:

Turbo models: Optimized for low latency
Batch processing: Efficient for high throughput
Streaming: Real-time token generation

Error Handling

try {
  const response = await llm.chat({ messages });
} catch (error) {
  if (error.message.includes("TOGETHER_API_KEY")) {
    console.error("API key not set or invalid");
  } else {
    console.error("API error:", error.message);
  }
}

Best Practices

Use Turbo models: Better performance for production
Match embedding context: Use 32K model for long documents
Enable streaming: Better UX for chat applications
Choose right model size: Balance cost vs. quality needs
Set appropriate tokens: Control response length and costs

Pricing

Together AI offers competitive pricing for open-source models. Check Together AI pricing for current rates.

Core Package

Main Package

LLM Providers

Vector Stores

Workflow & Tools

Overview

Installation

Basic Usage

LLM

Embeddings

Constructor Options

TogetherLLM

TogetherEmbedding

Supported Models

Chat Models

Llama 3.1

Llama 3

Llama 2

Mixtral

Qwen

Embedding Models

Streaming

Function Calling

With LlamaIndex

Convenience Functions

Configuration

Environment Variables

Custom Base URL

Model Selection Guide

Performance

Error Handling

Best Practices

Pricing

See Also

Build docs developers (and LLMs) love

Core Package

Main Package

LLM Providers

Vector Stores

Workflow & Tools

​Overview

​Installation

​Basic Usage

​LLM

​Embeddings

​Constructor Options

​TogetherLLM

​TogetherEmbedding

​Supported Models

​Chat Models

​Llama 3.1

​Llama 3

​Llama 2

​Mixtral

​Qwen

​Embedding Models

​Streaming

​Function Calling

​With LlamaIndex

​Convenience Functions

​Configuration

​Environment Variables

​Custom Base URL

​Model Selection Guide

​Performance

​Error Handling

​Best Practices

​Pricing

​See Also

Build docs developers (and LLMs) love

Overview

Installation

Basic Usage

LLM

Embeddings

Constructor Options

TogetherLLM

TogetherEmbedding

Supported Models

Chat Models

Llama 3.1

Llama 3

Llama 2

Mixtral

Qwen

Embedding Models

Streaming

Function Calling

With LlamaIndex

Convenience Functions

Configuration

Environment Variables

Custom Base URL

Model Selection Guide

Performance

Error Handling

Best Practices

Pricing

See Also