Skip to main content
Phoenix provides first-class support for LlamaIndex, a powerful data framework for building RAG (retrieval-augmented generation) applications. LlamaIndex makes it easy to connect your LLM to your own data sources.

Installation

pip install openinference-instrumentation-llama-index llama-index>=0.11.0

Setup

1

Initialize the instrumentation

from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from phoenix.otel import register

tracer_provider = register(
  project_name="my-llm-app"
)
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
2

Use LlamaIndex as normal

All LlamaIndex operations will now be automatically traced!

Basic Example

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import os

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-api-key"

# Load documents and create index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query the index - automatically traced!
query_engine = index.as_query_engine()
response = query_engine.query("What are the key insights from the documents?")
print(response)

What Gets Traced

Phoenix automatically captures:
  • Queries: Query text, retrieved documents, synthesis
  • Retrievals: Document chunks, similarity scores, ranking
  • LLM Calls: Prompts, completions, tokens, model parameters
  • Embeddings: Text being embedded, embedding model
  • Agents: Tool selection, reasoning steps, execution
  • Synthesis: Context assembly, prompt construction

Advanced Examples

RAG with Custom Retrieval

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# Load and index documents
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Configure retriever with custom parameters
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,  # Retrieve top 10 documents
)

# Add post-processing to filter by similarity
response_synthesizer = index.as_query_engine().response_synthesizer
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

# Phoenix traces the entire retrieval pipeline
response = query_engine.query("What is Phoenix?")
print(response)

Agent with Tools

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI

def multiply(a: float, b: float) -> float:
    """Multiply two numbers and return the result."""
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and return the result."""
    return a + b

# Create tools
multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)

# Create agent
llm = OpenAI(model="gpt-4")
agent = ReActAgent.from_tools(
    [multiply_tool, add_tool],
    llm=llm,
    verbose=True
)

# Phoenix traces the agent's reasoning and tool usage
response = agent.chat("What is (5 + 3) * 4?")
print(response)

Sub-Question Query Engine

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

# Load documents for different topics
phoenix_docs = SimpleDirectoryReader("phoenix_docs").load_data()
llamaindex_docs = SimpleDirectoryReader("llamaindex_docs").load_data()

# Create separate indexes
phoenix_index = VectorStoreIndex.from_documents(phoenix_docs)
llamaindex_index = VectorStoreIndex.from_documents(llamaindex_docs)

# Create query engine tools
phoenix_engine = phoenix_index.as_query_engine()
llamaindex_engine = llamaindex_index.as_query_engine()

query_engine_tools = [
    QueryEngineTool(
        query_engine=phoenix_engine,
        metadata=ToolMetadata(
            name="phoenix_docs",
            description="Information about Phoenix observability"
        ),
    ),
    QueryEngineTool(
        query_engine=llamaindex_engine,
        metadata=ToolMetadata(
            name="llamaindex_docs",
            description="Information about LlamaIndex framework"
        ),
    ),
]

# Create sub-question query engine
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
)

# Phoenix traces the decomposed questions and synthesis
response = query_engine.query(
    "How can I use Phoenix with LlamaIndex to monitor my RAG application?"
)
print(response)

Streaming Responses

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(streaming=True)

# Stream the response
response = query_engine.query("Summarize the key points")
for text in response.response_gen:
    print(text, end="", flush=True)

# Phoenix captures the full streamed response

Chat Engine

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.memory import ChatMemoryBuffer

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create chat engine with memory
memory = ChatMemoryBuffer.from_defaults(token_limit=3000)
chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",
    memory=memory,
    verbose=True
)

# Phoenix traces the conversation context
response = chat_engine.chat("What is this document about?")
print(response)

response = chat_engine.chat("Tell me more about that topic")
print(response)

Observability in Phoenix

Once instrumented, you can:
  • Visualize the RAG pipeline from query to response
  • Inspect retrieved documents with similarity scores
  • Monitor embedding calls and vector search performance
  • Track LLM token usage and costs
  • Debug agent reasoning and tool selection
  • Analyze query latency across pipeline stages

Resources

Example Notebook

Complete tutorial

OpenInference Package

View source code

Working Examples

More code examples

LlamaIndex Docs

Official documentation

Legacy Versions

For LlamaIndex versions prior to 0.11.0, see the legacy integration guide in the Phoenix documentation.

Build docs developers (and LLMs) love