Skip to main content

Overview

Agentic RAG combines Retrieval-Augmented Generation with intelligent agent capabilities using the Agno framework. This example demonstrates how to create a knowledge base from web URLs, store documents in a vector database, and query them using an AI agent with semantic search.

Key Features

  • Dynamic Knowledge Base: Load multiple web URLs into a persistent vector database
  • Intelligent Retrieval: Advanced semantic search using OpenAI embeddings
  • Conversational Interface: Streamlit-based chat for natural interactions
  • AI Observability: Integrated with Arize Phoenix for monitoring and tracing
  • Real-time Streaming: Get responses as they’re generated
  • Vector Search: Lightning-fast similarity search using LanceDB

Architecture

Vector Database Setup

LanceDB Configuration

from agno.vectordb.lancedb import LanceDb, SearchType
from agno.embedder.openai import OpenAIEmbedder

# Configure vector database
vector_db = LanceDb(
    table_name="mcp-docs-knowledge-base",  # Table name for storing vectors
    uri="tmp/lancedb",                     # Local storage path
    search_type=SearchType.vector,         # Vector similarity search
    embedder=OpenAIEmbedder(id="text-embedding-3-small")  # Embedding model
)
LanceDB provides high-performance vector storage with:
  • Fast similarity search
  • Columnar storage format
  • Native support for embeddings
  • ACID transactions

Implementation

Knowledge Base Loading

from agno.knowledge.url import UrlKnowledge
from agno.vectordb.lancedb import LanceDb, SearchType
from agno.embedder.openai import OpenAIEmbedder

def load_knowledge_base(urls: list[str] = None):
    """Load URLs into vector database."""
    knowledge_base = UrlKnowledge(
        urls=urls or [],
        vector_db=LanceDb(
            table_name="mcp-docs-knowledge-base",
            uri="tmp/lancedb",
            search_type=SearchType.vector,
            embedder=OpenAIEmbedder(id="text-embedding-3-small"),
        ),
    )
    knowledge_base.load()  # Downloads, chunks, embeds, and stores
    return knowledge_base

RAG Agent Creation

from agno.agent import Agent, RunResponseEvent
from agno.models.openai import OpenAIChat
from typing import Iterator

def agentic_rag_response(
    urls: list[str] = None, 
    query: str = ""
) -> Iterator[RunResponseEvent]:
    """Create RAG agent and stream response."""
    # Load knowledge base
    knowledge_base = load_knowledge_base(urls)
    
    # Create agent with knowledge
    agent = Agent(
        model=OpenAIChat(id="gpt-4o"),
        knowledge=knowledge_base,
        search_knowledge=True,  # Enable knowledge search
        markdown=True,
    )
    
    # Stream response
    response: Iterator[RunResponseEvent] = agent.run(query, stream=True)
    return response

Embedding Pipeline

The knowledge loading process:
  1. URL Fetching: Downloads content from web URLs
  2. Text Extraction: Extracts readable text from HTML
  3. Chunking: Splits text into semantic chunks
  4. Embedding: Generates embeddings using OpenAI
  5. Storage: Stores vectors in LanceDB
# The knowledge base automatically:
# 1. Fetches URL content
# 2. Chunks text (typically 512-1024 tokens)
# 3. Generates embeddings via OpenAI API
# 4. Stores in LanceDB with metadata

knowledge_base.load()  # All steps happen here

Arize Phoenix Observability

Phoenix Integration

import os
from phoenix.otel import register

# Set environment variables for Arize Phoenix
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('ARIZE_PHOENIX_API_KEY')}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

# Configure the Phoenix tracer
tracer_provider = register(
    project_name="default",
    auto_instrument=True,  # Automatically trace OpenAI calls
)
Arize Phoenix provides:
  • Request tracing for all API calls
  • Performance monitoring (latency, token usage)
  • Error tracking and debugging
  • Usage analytics and insights

Streamlit Application

import streamlit as st
from typing import Iterator

st.set_page_config(page_title="Agentic RAG", layout="wide")
st.title("Agentic RAG with Agno & GPT-4o")

# Sidebar: URL management
with st.sidebar:
    st.markdown("### 🧠 Knowledge Base URLs")
    
    if "urls" not in st.session_state:
        st.session_state.urls = [""]
    
    # URL input fields
    for i, url in enumerate(st.session_state.urls):
        st.session_state.urls[i] = st.text_input(
            f"URL {i+1}",
            value=url,
            key=f"url_{i}",
            label_visibility="collapsed"
        )
    
    # Add URL button
    if st.button("➕"):
        if st.session_state.urls and st.session_state.urls[-1].strip() != "":
            st.session_state.urls.append("")
    
    # Load knowledge base
    if st.button("Load Knowledge Base"):
        urls = [u for u in st.session_state.urls if u.strip()]
        urls = list(dict.fromkeys(urls))  # Remove duplicates
        
        if urls:
            with st.spinner("Loading knowledge base..."):
                try:
                    knowledge_base = load_knowledge_base(urls)
                    st.session_state.docs_loaded = True
                    st.session_state.loaded_urls = urls.copy()
                    st.success(f"Loaded {len(urls)} URL(s)!")
                except Exception as e:
                    st.error(f"Error: {str(e)}")
        else:
            st.warning("Please add at least one URL.")
    
    # Reset button
    if st.button("🔄 Reset KB"):
        st.session_state.docs_loaded = False
        if 'loaded_urls' in st.session_state:
            del st.session_state['loaded_urls']
        st.success("Knowledge base reset!")
        st.rerun()

# Chat interface
query = st.chat_input("Ask a question")

if query:
    if not st.session_state.get('docs_loaded', False):
        st.warning("Please load the knowledge base first.")
    else:
        loaded_urls = st.session_state.loaded_urls
        response = agentic_rag_response(loaded_urls, query)
        
        st.markdown("#### Answer")
        answer = ""
        answer_placeholder = st.empty()
        
        # Stream response
        for content in response:
            if hasattr(content, 'event') and content.event == "RunResponseContent":
                answer += content.content
                answer_placeholder.markdown(answer)

Vector Search Process

When a query is made:
  1. Query Embedding: User question is embedded using OpenAI
  2. Vector Search: LanceDB finds similar document chunks
  3. Context Retrieval: Top-k most relevant chunks are retrieved
  4. Augmented Prompt: Retrieved context is added to the prompt
  5. LLM Generation: GPT-4o generates answer with context
# This happens automatically when agent.run() is called
# with search_knowledge=True:

# 1. Query embedding
query_vector = embedder.embed(query)

# 2. Vector search in LanceDB
results = vector_db.search(
    query_vector,
    limit=5,  # Top 5 most similar chunks
    metric="cosine"  # Cosine similarity
)

# 3. Context is automatically added to prompt
# 4. LLM generates response with context

Configuration Options

Vector Database Settings

table_name
string
required
Name of the LanceDB table for storing vectors
uri
string
default:"tmp/lancedb"
Local storage path for the vector database
search_type
SearchType
default:"vector"
Search algorithm: vector for similarity search

Embedding Settings

embedder
Embedder
required
Embedding model - OpenAIEmbedder(id="text-embedding-3-small")

Agent Settings

search_knowledge
bool
default:"false"
Enable automatic knowledge base search for queries
markdown
bool
default:"false"
Format responses in markdown

Installation

git clone https://github.com/Arindam200/awesome-ai-apps.git
cd rag_apps/agentic_rag
uv sync

Environment Setup

Create a .env file:
OPENAI_API_KEY=your_openai_api_key_here
ARIZE_PHOENIX_API_KEY=your_phoenix_api_key_here  # Optional

Running the Application

uv run streamlit run main.py

Use Cases

Documentation Q&A

Load API docs and ask implementation questions

Research Assistant

Index research papers and query specific topics

Knowledge Base

Load internal documents for employee queries

Educational Content

Index course materials and ask study questions

Performance Optimization

1

Choose Quality URLs

Select URLs with high-quality, relevant content for better retrieval
2

Optimize Chunk Size

Adjust chunking parameters based on content type
3

Use Appropriate Embeddings

text-embedding-3-small is fast and cost-effective for most use cases
4

Monitor with Phoenix

Use Arize Phoenix to track performance and optimize retrieval

Best Practices

  • URL Selection: Choose authoritative, well-structured content sources
  • Knowledge Base Size: Balance comprehensiveness with query performance
  • Query Specificity: More specific questions yield better results
  • Regular Updates: Reload knowledge base when source content changes
  • Error Handling: Implement retry logic for URL fetching failures

Troubleshooting

Common Issues

Ensure you’ve clicked “Load Knowledge Base” after adding URLs and check that URLs are accessible
Verify your API key is correct and has sufficient credits. Check internet connectivity
Clear the tmp/lancedb directory if you encounter corruption. Restart the application
Try more specific queries, add more relevant URLs, or adjust embedding model

Agno Documentation

Official Agno framework documentation

LanceDB

LanceDB vector database documentation

Arize Phoenix

AI observability platform

OpenAI Embeddings

OpenAI embeddings guide

Build docs developers (and LLMs) love