Agentic RAG - Awesome AI Apps

Overview

Agentic RAG combines Retrieval-Augmented Generation with intelligent agent capabilities using the Agno framework. This example demonstrates how to create a knowledge base from web URLs, store documents in a vector database, and query them using an AI agent with semantic search.

Key Features

Dynamic Knowledge Base: Load multiple web URLs into a persistent vector database
Intelligent Retrieval: Advanced semantic search using OpenAI embeddings
Conversational Interface: Streamlit-based chat for natural interactions
AI Observability: Integrated with Arize Phoenix for monitoring and tracing
Real-time Streaming: Get responses as they’re generated
Vector Search: Lightning-fast similarity search using LanceDB

Architecture

Vector Database Setup

LanceDB Configuration

from agno.vectordb.lancedb import LanceDb, SearchType
from agno.embedder.openai import OpenAIEmbedder

# Configure vector database
vector_db = LanceDb(
    table_name="mcp-docs-knowledge-base",  # Table name for storing vectors
    uri="tmp/lancedb",                     # Local storage path
    search_type=SearchType.vector,         # Vector similarity search
    embedder=OpenAIEmbedder(id="text-embedding-3-small")  # Embedding model
)

LanceDB provides high-performance vector storage with:

Fast similarity search
Columnar storage format
Native support for embeddings
ACID transactions

Implementation

Knowledge Base Loading

from agno.knowledge.url import UrlKnowledge
from agno.vectordb.lancedb import LanceDb, SearchType
from agno.embedder.openai import OpenAIEmbedder

def load_knowledge_base(urls: list[str] = None):
    """Load URLs into vector database."""
    knowledge_base = UrlKnowledge(
        urls=urls or [],
        vector_db=LanceDb(
            table_name="mcp-docs-knowledge-base",
            uri="tmp/lancedb",
            search_type=SearchType.vector,
            embedder=OpenAIEmbedder(id="text-embedding-3-small"),
        ),
    )
    knowledge_base.load()  # Downloads, chunks, embeds, and stores
    return knowledge_base

RAG Agent Creation

from agno.agent import Agent, RunResponseEvent
from agno.models.openai import OpenAIChat
from typing import Iterator

def agentic_rag_response(
    urls: list[str] = None, 
    query: str = ""
) -> Iterator[RunResponseEvent]:
    """Create RAG agent and stream response."""
    # Load knowledge base
    knowledge_base = load_knowledge_base(urls)
    
    # Create agent with knowledge
    agent = Agent(
        model=OpenAIChat(id="gpt-4o"),
        knowledge=knowledge_base,
        search_knowledge=True,  # Enable knowledge search
        markdown=True,
    )
    
    # Stream response
    response: Iterator[RunResponseEvent] = agent.run(query, stream=True)
    return response

Embedding Pipeline

The knowledge loading process:

URL Fetching: Downloads content from web URLs
Text Extraction: Extracts readable text from HTML
Chunking: Splits text into semantic chunks
Embedding: Generates embeddings using OpenAI
Storage: Stores vectors in LanceDB

# The knowledge base automatically:
# 1. Fetches URL content
# 2. Chunks text (typically 512-1024 tokens)
# 3. Generates embeddings via OpenAI API
# 4. Stores in LanceDB with metadata

knowledge_base.load()  # All steps happen here

Arize Phoenix Observability

Phoenix Integration

import os
from phoenix.otel import register

# Set environment variables for Arize Phoenix
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('ARIZE_PHOENIX_API_KEY')}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

# Configure the Phoenix tracer
tracer_provider = register(
    project_name="default",
    auto_instrument=True,  # Automatically trace OpenAI calls
)

Arize Phoenix provides:

Request tracing for all API calls
Performance monitoring (latency, token usage)
Error tracking and debugging
Usage analytics and insights

Streamlit Application

import streamlit as st
from typing import Iterator

st.set_page_config(page_title="Agentic RAG", layout="wide")
st.title("Agentic RAG with Agno & GPT-4o")

# Sidebar: URL management
with st.sidebar:
    st.markdown("### 🧠 Knowledge Base URLs")
    
    if "urls" not in st.session_state:
        st.session_state.urls = [""]
    
    # URL input fields
    for i, url in enumerate(st.session_state.urls):
        st.session_state.urls[i] = st.text_input(
            f"URL {i+1}",
            value=url,
            key=f"url_{i}",
            label_visibility="collapsed"
        )
    
    # Add URL button
    if st.button("➕"):
        if st.session_state.urls and st.session_state.urls[-1].strip() != "":
            st.session_state.urls.append("")
    
    # Load knowledge base
    if st.button("Load Knowledge Base"):
        urls = [u for u in st.session_state.urls if u.strip()]
        urls = list(dict.fromkeys(urls))  # Remove duplicates
        
        if urls:
            with st.spinner("Loading knowledge base..."):
                try:
                    knowledge_base = load_knowledge_base(urls)
                    st.session_state.docs_loaded = True
                    st.session_state.loaded_urls = urls.copy()
                    st.success(f"Loaded {len(urls)} URL(s)!")
                except Exception as e:
                    st.error(f"Error: {str(e)}")
        else:
            st.warning("Please add at least one URL.")
    
    # Reset button
    if st.button("🔄 Reset KB"):
        st.session_state.docs_loaded = False
        if 'loaded_urls' in st.session_state:
            del st.session_state['loaded_urls']
        st.success("Knowledge base reset!")
        st.rerun()

# Chat interface
query = st.chat_input("Ask a question")

if query:
    if not st.session_state.get('docs_loaded', False):
        st.warning("Please load the knowledge base first.")
    else:
        loaded_urls = st.session_state.loaded_urls
        response = agentic_rag_response(loaded_urls, query)
        
        st.markdown("#### Answer")
        answer = ""
        answer_placeholder = st.empty()
        
        # Stream response
        for content in response:
            if hasattr(content, 'event') and content.event == "RunResponseContent":
                answer += content.content
                answer_placeholder.markdown(answer)

Vector Search Process

Similarity Search

When a query is made:

Query Embedding: User question is embedded using OpenAI
Vector Search: LanceDB finds similar document chunks
Context Retrieval: Top-k most relevant chunks are retrieved
Augmented Prompt: Retrieved context is added to the prompt
LLM Generation: GPT-4o generates answer with context

# This happens automatically when agent.run() is called
# with search_knowledge=True:

# 1. Query embedding
query_vector = embedder.embed(query)

# 2. Vector search in LanceDB
results = vector_db.search(
    query_vector,
    limit=5,  # Top 5 most similar chunks
    metric="cosine"  # Cosine similarity
)

# 3. Context is automatically added to prompt
# 4. LLM generates response with context

Configuration Options

Vector Database Settings

table_name

string

required

Name of the LanceDB table for storing vectors

uri

string

default:"tmp/lancedb"

Local storage path for the vector database

search_type

SearchType

default:"vector"

Search algorithm: vector for similarity search

Embedding Settings

embedder

Embedder

required

Embedding model - OpenAIEmbedder(id="text-embedding-3-small")

Agent Settings

search_knowledge

bool

default:"false"

Enable automatic knowledge base search for queries

markdown

bool

default:"false"

Format responses in markdown

Installation

git clone https://github.com/Arindam200/awesome-ai-apps.git
cd rag_apps/agentic_rag
uv sync

Environment Setup

Create a .env file:

OPENAI_API_KEY=your_openai_api_key_here
ARIZE_PHOENIX_API_KEY=your_phoenix_api_key_here  # Optional

Running the Application

uv run streamlit run main.py

Use Cases

Documentation Q&A

Load API docs and ask implementation questions

Research Assistant

Index research papers and query specific topics

Knowledge Base

Load internal documents for employee queries

Educational Content

Index course materials and ask study questions

Performance Optimization

Choose Quality URLs

Select URLs with high-quality, relevant content for better retrieval

Optimize Chunk Size

Adjust chunking parameters based on content type

Use Appropriate Embeddings

text-embedding-3-small is fast and cost-effective for most use cases

Monitor with Phoenix

Use Arize Phoenix to track performance and optimize retrieval

Best Practices

URL Selection: Choose authoritative, well-structured content sources
Knowledge Base Size: Balance comprehensiveness with query performance
Query Specificity: More specific questions yield better results
Regular Updates: Reload knowledge base when source content changes
Error Handling: Implement retry logic for URL fetching failures

Troubleshooting

Common Issues

Knowledge base not loaded error

Ensure you’ve clicked “Load Knowledge Base” after adding URLs and check that URLs are accessible

OpenAI API errors

Verify your API key is correct and has sufficient credits. Check internet connectivity

Vector database issues

Clear the tmp/lancedb directory if you encounter corruption. Restart the application

Poor retrieval quality

Try more specific queries, add more relevant URLs, or adjust embedding model

Agno Documentation

Official Agno framework documentation

LanceDB

LanceDB vector database documentation

Arize Phoenix

AI observability platform

OpenAI Embeddings

OpenAI embeddings guide

Starter Agents

Simple Agents

MCP Agents

Memory Agents

RAG Applications

Advanced Agents

​Overview

​Key Features

​Architecture

​Vector Database Setup

​LanceDB Configuration

​Implementation

​Knowledge Base Loading

​RAG Agent Creation

​Embedding Pipeline

​Arize Phoenix Observability

​Phoenix Integration

​Streamlit Application

​Vector Search Process

​Similarity Search

​Configuration Options

​Vector Database Settings

​Embedding Settings

​Agent Settings

​Installation

​Environment Setup

​Running the Application

​Use Cases

Documentation Q&A

Research Assistant

Knowledge Base

Educational Content

​Performance Optimization

​Best Practices

​Troubleshooting

​Common Issues

​Related Resources

Agno Documentation

LanceDB

Arize Phoenix

OpenAI Embeddings

Build docs developers (and LLMs) love

Overview

Key Features

Architecture

Vector Database Setup

LanceDB Configuration

Implementation

Knowledge Base Loading

RAG Agent Creation

Embedding Pipeline

Arize Phoenix Observability

Phoenix Integration

Streamlit Application

Vector Search Process

Similarity Search

Configuration Options

Vector Database Settings

Embedding Settings

Agent Settings

Installation

Environment Setup

Running the Application

Use Cases

Performance Optimization

Best Practices

Troubleshooting

Common Issues

Related Resources