Skip to main content
In this guide, we go through several examples of how to use gr.ChatInterface with popular LLM libraries and API providers. We will cover the following libraries and API providers:
  • Llama Index
  • LangChain
  • OpenAI
  • Hugging Face transformers
  • SambaNova
  • Hyperbolic
  • Anthropic’s Claude
For many LLM libraries and providers, there exist community-maintained integration libraries that make it even easier to spin up Gradio apps. We reference these libraries in the appropriate sections below.

Llama Index

Let’s start by using llama-index on top of openai to build a RAG chatbot on any text or PDF files that you can demo and share in less than 30 lines of code. You’ll need to have an OpenAI key for this example (keep reading for the free, open-source equivalent!):
import gradio as gr
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

def chat(message, history):
    chat_engine = index.as_chat_engine()
    response = chat_engine.stream_chat(message)
    partial_response = ""
    for token in response.response_gen:
        partial_response += token
        yield partial_response

demo = gr.ChatInterface(
    fn=chat,
    title="Chat with your documents using LlamaIndex",
    description="Upload your documents in the 'data' folder and start chatting!",
)

if __name__ == "__main__":
    demo.launch()

LangChain

Here’s an example using langchain on top of openai to build a general-purpose chatbot. As before, you’ll need to have an OpenAI key for this example:
import gradio as gr
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, AIMessage

llm = ChatOpenAI(model="gpt-4o-mini")

def chat(message, history):
    # Convert history to LangChain message format
    lc_history = []
    for msg in history:
        if msg["role"] == "user":
            lc_history.append(HumanMessage(content=msg["content"][0]["text"]))
        else:
            lc_history.append(AIMessage(content=msg["content"][0]["text"]))
    
    # Add current message
    lc_history.append(HumanMessage(content=message))
    
    # Get response
    response = llm.stream(lc_history)
    partial_response = ""
    for chunk in response:
        partial_response += chunk.content
        yield partial_response

demo = gr.ChatInterface(
    fn=chat,
    title="LangChain Chatbot",
)

if __name__ == "__main__":
    demo.launch()
For quick prototyping, the community-maintained langchain-gradio repo makes it even easier to build chatbots on top of LangChain.

OpenAI

Of course, we could also use the openai library directly. Here’s a similar example to the LangChain one, but this time with streaming as well:
import gradio as gr
from openai import OpenAI

client = OpenAI()

def chat(message, history):
    # Convert history to OpenAI message format
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for msg in history:
        if msg["role"] == "user":
            messages.append({"role": "user", "content": msg["content"][0]["text"]})
        else:
            messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
    
    # Add current message
    messages.append({"role": "user", "content": message})
    
    # Get streaming response
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True,
    )
    
    partial_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            partial_response += chunk.choices[0].delta.content
            yield partial_response

demo = gr.ChatInterface(
    fn=chat,
    title="OpenAI Chatbot",
)

if __name__ == "__main__":
    demo.launch()
For quick prototyping, the openai-gradio library makes it even easier to build chatbots on top of OpenAI models.

Hugging Face transformers

Of course, in many cases you want to run a chatbot locally. Here’s the equivalent example using the SmolLM2-135M-Instruct model using the Hugging Face transformers library:
import gradio as gr
from transformers import pipeline, TextIteratorStreamer
from threading import Thread

pipe = pipeline(
    "text-generation",
    model="HuggingFaceTB/SmolLM2-135M-Instruct",
    device_map="auto"
)

def chat(message, history):
    # Convert history to transformers format
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for msg in history:
        if msg["role"] == "user":
            messages.append({"role": "user", "content": msg["content"][0]["text"]})
        else:
            messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
    
    # Add current message
    messages.append({"role": "user", "content": message})
    
    # Generate response with streaming
    streamer = TextIteratorStreamer(pipe.tokenizer, skip_special_tokens=True)
    generation_kwargs = dict(
        messages=messages,
        max_new_tokens=256,
        streamer=streamer,
    )
    
    thread = Thread(target=pipe, kwargs=generation_kwargs)
    thread.start()
    
    partial_response = ""
    for token in streamer:
        partial_response += token
        yield partial_response

demo = gr.ChatInterface(
    fn=chat,
    title="SmolLM2 Chatbot",
)

if __name__ == "__main__":
    demo.launch()

SambaNova

The SambaNova Cloud API provides access to full-precision open-source models, such as the Llama family. Here’s an example of how to build a Gradio app around the SambaNova API:
import gradio as gr
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.sambanova.ai/v1",
    api_key=os.environ.get("SAMBANOVA_API_KEY"),
)

def chat(message, history):
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for msg in history:
        if msg["role"] == "user":
            messages.append({"role": "user", "content": msg["content"][0]["text"]})
        else:
            messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
    
    messages.append({"role": "user", "content": message})
    
    stream = client.chat.completions.create(
        model="Meta-Llama-3.1-8B-Instruct",
        messages=messages,
        stream=True,
    )
    
    partial_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            partial_response += chunk.choices[0].delta.content
            yield partial_response

demo = gr.ChatInterface(
    fn=chat,
    title="SambaNova Chatbot",
)

if __name__ == "__main__":
    demo.launch()
For quick prototyping, the sambanova-gradio library makes it even easier to build chatbots on top of SambaNova models.

Hyperbolic

The Hyperbolic AI API provides access to many open-source models, such as the Llama family. Here’s an example of how to build a Gradio app around the Hyperbolic API:
import gradio as gr
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.hyperbolic.xyz/v1",
    api_key=os.environ.get("HYPERBOLIC_API_KEY"),
)

def chat(message, history):
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    for msg in history:
        if msg["role"] == "user":
            messages.append({"role": "user", "content": msg["content"][0]["text"]})
        else:
            messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
    
    messages.append({"role": "user", "content": message})
    
    stream = client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct",
        messages=messages,
        stream=True,
    )
    
    partial_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            partial_response += chunk.choices[0].delta.content
            yield partial_response

demo = gr.ChatInterface(
    fn=chat,
    title="Hyperbolic Chatbot",
)

if __name__ == "__main__":
    demo.launch()
For quick prototyping, the hyperbolic-gradio library makes it even easier to build chatbots on top of Hyperbolic models.

Anthropic’s Claude

Anthropic’s Claude model can also be used via API. Here’s a simple 20 questions-style game built on top of the Anthropic API:
import gradio as gr
import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

def chat(message, history):
    # Convert history to Anthropic format
    messages = []
    for msg in history:
        if msg["role"] == "user":
            messages.append({"role": "user", "content": msg["content"][0]["text"]})
        else:
            messages.append({"role": "assistant", "content": msg["content"][0]["text"]})
    
    # Add current message
    messages.append({"role": "user", "content": message})
    
    # Get streaming response
    with client.messages.stream(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system="You are playing 20 questions. Think of an object and only answer yes/no questions about it. Don't reveal the object until the user guesses correctly or uses all 20 questions.",
        messages=messages,
    ) as stream:
        partial_response = ""
        for text in stream.text_stream:
            partial_response += text
            yield partial_response

demo = gr.ChatInterface(
    fn=chat,
    title="20 Questions with Claude",
    description="I'm thinking of an object. You have 20 questions to guess what it is!",
)

if __name__ == "__main__":
    demo.launch()
These examples demonstrate how easy it is to integrate various LLM providers with Gradio’s ChatInterface. You can mix and match different providers and customize the chat function to suit your specific needs.