Chat with X Tutorials - Awesome LLM Apps

Overview

Chat with X applications use Retrieval Augmented Generation (RAG) to enable conversations with various data sources. These tutorials show you how to build interactive chat interfaces for documents, codebases, emails, and multimedia content.

Chat with PDF

Extract and query PDF documents

Chat with GitHub

Search and analyze codebases

Chat with Gmail

Query your email inbox

Chat with YouTube

Analyze video transcripts

Chat with Research

Search academic papers

Chat with Substack

Query newsletter archives

Core RAG Architecture

All “Chat with X” applications follow a common pattern:

Data Ingestion

Load and preprocess content from the target source (PDF, GitHub, Gmail, etc.)

Chunking & Embedding

Split content into chunks and generate vector embeddings

Vector Storage

Store embeddings in a vector database (Chroma, Qdrant, etc.)

Retrieval

Find relevant chunks using semantic similarity search

Generation

Pass retrieved context to LLM for answer generation

Chat with PDF

Build a RAG application to query PDF documents in just 30 lines of Python

Implementation

import os
import tempfile
import streamlit as st
from embedchain import App

def embedchain_bot(db_path, api_key):
    return App.from_config(
        config={
            "llm": {"provider": "openai", "config": {"api_key": api_key}},
            "vectordb": {"provider": "chroma", "config": {"dir": db_path}},
            "embedder": {"provider": "openai", "config": {"api_key": api_key}},
        }
    )

st.title("Chat with PDF")

openai_access_token = st.text_input("OpenAI API Key", type="password")

if openai_access_token:
    db_path = tempfile.mkdtemp()
    app = embedchain_bot(db_path, openai_access_token)

    pdf_file = st.file_uploader("Upload a PDF file", type="pdf")

    if pdf_file:
        with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as f:
            f.write(pdf_file.getvalue())
            app.add(f.name, data_type="pdf_file")
        os.remove(f.name)
        st.success(f"Added {pdf_file.name} to knowledge base!")

    prompt = st.text_input("Ask a question about the PDF")

    if prompt:
        answer = app.chat(prompt)
        st.write(answer)

Key Features

PDF Processing

Extracts text from multi-page PDFs
Handles embedded images and tables
Preserves document structure
Supports scanned PDFs with OCR (optional)

Chunking Strategy

# Optimal chunking for PDFs
chunk_size = 1000  # tokens
chunk_overlap = 200  # tokens for context preservation

# Embedchain handles this automatically
app.add(pdf_path, data_type="pdf_file")

Advanced Queries

Effective prompt patterns:

“Summarize the key findings in section 3”
“What methodology was used in the research?”
“Compare the results from pages 5 and 10”
“Extract all statistics about X”

Setup

Installation
Run
Source

pip install streamlit embedchain openai chromadb

streamlit run chat_pdf.py

Chat with GitHub Repos

Query codebases, understand architecture, and find implementations using natural language

Implementation

from embedchain.pipeline import Pipeline as App
from embedchain.loaders.github import GithubLoader
import streamlit as st
import os

loader = GithubLoader(
    config={
        "token": "your_github_token",
    }
)

st.title("Chat with GitHub Repository 💬")
st.caption("Query codebases using natural language")

openai_access_token = st.text_input("OpenAI API Key", type="password")

if openai_access_token:
    os.environ["OPENAI_API_KEY"] = openai_access_token
    app = App()
    
    git_repo = st.text_input("Enter GitHub Repo (e.g., username/repo)")
    
    if git_repo:
        # Add repo to knowledge base
        app.add(
            f"repo:{git_repo} type:repo",
            data_type="github",
            loader=loader
        )
        st.success(f"Added {git_repo} to knowledge base!")
        
        # Ask questions
        prompt = st.text_input("Ask about the repository")
        
        if prompt:
            answer = app.chat(prompt)
            st.write(answer)

Example Queries

Architecture

“How is the authentication system structured?”

Implementation

“Show me how error handling is implemented”

Dependencies

“What external libraries does this project use?”

Best Practices

“How does the codebase handle configuration?”

GitHub Access Configuration

Generate Personal Access Token

Go to GitHub Settings → Developer settings → Personal access tokens

Set Permissions

Enable repo scope for accessing repository contents

Configure Loader

loader = GithubLoader(config={"token": "ghp_your_token_here"})

Never commit GitHub tokens to version control. Use environment variables or secrets management.

Chat with Gmail

Search and analyze your email inbox using natural language queries

Implementation

import tempfile
import streamlit as st
from embedchain import App

def embedchain_bot(db_path, api_key):
    return App.from_config(
        config={
            "llm": {"provider": "openai", "config": {"api_key": api_key}},
            "vectordb": {"provider": "chroma", "config": {"dir": db_path}},
            "embedder": {"provider": "openai", "config": {"api_key": api_key}},
        }
    )

st.title("Chat with your Gmail Inbox 📧")

openai_access_token = st.text_input("OpenAI API Key", type="password")

# Gmail filter syntax
gmail_filter = "to: me label:inbox"

if openai_access_token:
    db_path = tempfile.mkdtemp()
    app = embedchain_bot(db_path, openai_access_token)
    
    # Add Gmail data
    app.add(gmail_filter, data_type="gmail")
    st.success("Added emails from Inbox to knowledge base!")

    prompt = st.text_input("Ask about your emails")

    if prompt:
        answer = app.query(prompt)
        st.write(answer)

Gmail API Setup

Complete OAuth Configuration

Create Google Cloud Project

Go to Google Cloud Console and create a new project

Enable Gmail API

Navigate to APIs & Services → Library → Search for “Gmail API” → Enable

Configure OAuth Consent

Go to APIs & Services → OAuth consent screen
Select “External” user type
Fill in app information
Add test users (your email)
Publish the consent screen

Create OAuth Credentials

APIs & Services → Credentials → Create Credentials
Select “OAuth client ID”
Application type: “Desktop app”
Download credentials as credentials.json

Place Credentials

Save credentials.json in your project directory

Gmail Query Filters

gmail_filter

string

Gmail search operators for filtering emails

Show Common Filters

# All inbox emails
"label:inbox"

# Unread emails
"is:unread"

# From specific sender
"from:[email protected]"

# Date range
"after:2024/01/01 before:2024/12/31"

# Has attachment
"has:attachment"

# Combine filters
"from:[email protected] is:unread has:attachment"

Example Queries

"Summarize emails from my manager this week"
"Find all emails about the Q4 project"
"What action items were mentioned in recent emails?"

Chat with YouTube Videos

Analyze video content through transcripts without watching the entire video

Implementation

import streamlit as st
from embedchain import App
import tempfile

st.title("Chat with YouTube Videos 📽️")

openai_key = st.text_input("OpenAI API Key", type="password")

if openai_key:
    db_path = tempfile.mkdtemp()
    app = App.from_config(
        config={
            "llm": {"provider": "openai", "config": {"api_key": openai_key}},
            "vectordb": {"provider": "chroma", "config": {"dir": db_path}},
            "embedder": {"provider": "openai", "config": {"api_key": openai_key}},
        }
    )
    
    youtube_url = st.text_input("YouTube Video URL")
    
    if youtube_url:
        # Add video transcript
        app.add(youtube_url, data_type="youtube_video")
        st.success("Video transcript added!")
        
        query = st.text_input("Ask about the video")
        
        if query:
            answer = app.chat(query)
            st.write(answer)
            
            # Display video
            st.video(youtube_url)

Transcript Processing

How It Works

Extract Transcript: Uses youtube-transcript-api to fetch captions
Chunk Text: Splits transcript into semantic chunks with timestamps
Generate Embeddings: Creates vector representations
Query: Retrieves relevant segments based on question
Context: Includes timestamp information in responses

Use Cases

Tutorial Videos

“What tools were used in this tutorial?”

Lectures

“Summarize the key concepts explained”

Podcasts

“What did they say about AI regulation?”

Product Reviews

“List all pros and cons mentioned”

Chat with Research Papers

Search and query arXiv papers using conversational AI

Implementation

import streamlit as st
from embedchain import App
import os

st.title("Chat with Arxiv Research Papers 🔎")

openai_key = st.text_input("OpenAI API Key", type="password")

if openai_key:
    os.environ["OPENAI_API_KEY"] = openai_key
    app = App()
    
    # arXiv search topic
    topic = st.text_input("Research topic (e.g., 'transformers in NLP')")
    
    if topic:
        # Search and add papers
        app.add(f"arxiv:{topic}", data_type="arxiv")
        st.success(f"Added papers about '{topic}'")
        
        query = st.text_input("Ask about the research")
        
        if query:
            answer = app.chat(query)
            st.write(answer)

Research Queries

Methodology Questions

“What datasets were used in these papers?”
“How do the approaches differ?”
“What evaluation metrics are common?”

Comparative Analysis

“Compare the results across different papers”
“Which method achieved the best performance?”
“What are the main limitations discussed?”

Implementation Details

“What architectures are used?”
“List the hyperparameters mentioned”
“What preprocessing steps are described?”

Chat with Substack

Query newsletter archives and extract insights from blog posts

Implementation

import streamlit as st
from embedchain import App
import os

st.title("Chat with Substack Newsletter 📝")

openai_key = st.text_input("OpenAI API Key", type="password")

if openai_key:
    os.environ["OPENAI_API_KEY"] = openai_key
    app = App()
    
    substack_url = st.text_input("Substack Blog URL")
    
    if substack_url:
        # Add Substack content
        app.add(substack_url, data_type="web_page")
        st.success("Substack newsletter added!")
        
        query = st.text_input("Ask about the content")
        
        if query:
            answer = app.chat(query)
            st.write(answer)

Common Patterns & Best Practices

Embedchain Configuration

config

object

Complete configuration object for Embedchain

Show Configuration Options

llm.provider

string

required

LLM provider: openai, anthropic, cohere, ollama

llm.config.model

string

Model name (e.g., gpt-4o, claude-3-5-sonnet-20241022)

vectordb.provider

string

required

Vector database: chroma, qdrant, pinecone, weaviate

embedder.provider

string

required

Embedding provider: openai, cohere, huggingface

Optimization Tips

Chunking Strategy

# Optimal chunk sizes by content type
chunk_configs = {
    "pdf": {"chunk_size": 1000, "overlap": 200},
    "code": {"chunk_size": 1500, "overlap": 300},
    "email": {"chunk_size": 800, "overlap": 100},
    "transcript": {"chunk_size": 1200, "overlap": 200},
}

Cost Management

Use GPT-4o-mini for embeddings (cheaper)
Cache vector databases between sessions
Limit retrieval to top 3-5 chunks
Implement query optimization

Response Quality

# Improve responses with better prompts
system_prompt = """
You are a helpful assistant analyzing {data_type}.
Always cite specific sections when answering.
If information is not in the context, say so clearly.
"""

app = App.from_config({
    "llm": {
        "provider": "openai",
        "config": {
            "system_prompt": system_prompt,
            "temperature": 0.3  # Lower for factual accuracy
        }
    }
})

Multi-Source Chat

Combine Multiple Data Sources

import streamlit as st
from embedchain import App
import tempfile

st.title("Multi-Source Chat")

api_key = st.text_input("OpenAI API Key", type="password")

if api_key:
    db_path = tempfile.mkdtemp()
    app = App.from_config({
        "llm": {"provider": "openai", "config": {"api_key": api_key}},
        "vectordb": {"provider": "chroma", "config": {"dir": db_path}},
        "embedder": {"provider": "openai", "config": {"api_key": api_key}},
    })
    
    # Add multiple sources
    pdf = st.file_uploader("Upload PDF", type="pdf")
    youtube = st.text_input("YouTube URL")
    github = st.text_input("GitHub Repo")
    
    if pdf:
        app.add(pdf, data_type="pdf_file")
    if youtube:
        app.add(youtube, data_type="youtube_video")
    if github:
        app.add(f"repo:{github} type:repo", data_type="github")
    
    # Query across all sources
    query = st.text_input("Ask anything")
    if query:
        answer = app.chat(query)
        st.write(answer)

Resources

Embedchain Docs

Complete framework documentation

Example Repository

All Chat with X implementations

RAG Tutorial

Step-by-step RAG guide

Gmail Tutorial

Complete Gmail RAG tutorial

Get Started

AI Agents

RAG Applications

Advanced Concepts

Agent Skills

Framework Guides

​Overview

Chat with PDF

Chat with GitHub

Chat with Gmail

Chat with YouTube

Chat with Research

Chat with Substack

​Core RAG Architecture

​Chat with PDF

​Implementation

​Key Features

​Setup

​Chat with GitHub Repos

​Implementation

​Example Queries

Architecture

Implementation

Dependencies

Best Practices

​GitHub Access Configuration

​Chat with Gmail

​Implementation

​Gmail API Setup

​Gmail Query Filters

​Example Queries

​Chat with YouTube Videos

​Implementation

​Transcript Processing

​Use Cases

Tutorial Videos

Lectures

Podcasts

Product Reviews

​Chat with Research Papers

​Implementation

​Research Queries

​Chat with Substack

​Implementation

​Common Patterns & Best Practices

​Embedchain Configuration

​Optimization Tips

​Multi-Source Chat

​Resources

Embedchain Docs

Example Repository

RAG Tutorial

Gmail Tutorial

Build docs developers (and LLMs) love

Overview

Core RAG Architecture

Chat with PDF

Implementation

Key Features

Setup

Chat with GitHub Repos

Implementation

Example Queries

GitHub Access Configuration

Chat with Gmail

Implementation

Gmail API Setup

Gmail Query Filters

Example Queries

Chat with YouTube Videos

Implementation

Transcript Processing

Use Cases

Chat with Research Papers

Implementation

Research Queries

Chat with Substack

Implementation

Common Patterns & Best Practices

Embedchain Configuration

Optimization Tips

Multi-Source Chat

Resources