How It Works
Chat with Webpage extracts text content from the current page and provides it to your AI model as context. You can choose between two processing modes:- RAG Mode (Recommended)
- Normal Mode
Uses Retrieval Augmented Generation with vector embeddings:
- Page content is extracted and split into chunks
- Chunks are converted to vector embeddings using your configured embedding model
- When you ask a question, relevant chunks are retrieved
- Only relevant context is sent to the AI, enabling longer documents
Getting Started
Enable Chat with Webpage
In the Sidebar:
- Open the sidebar on any webpage (
Ctrl+Shift+Y) - Click the webpage icon in the input area to enable
- The icon highlights when active
- Open Web UI (
Ctrl+Shift+L) - Navigate to the webpage you want to analyze
- Enable the webpage chat mode
Configure RAG (Optional)
For better performance with long pages:
- Go to Settings → RAG Settings
- Select an embedding model (recommended:
nomic-embed-text) - Configure chunk size (default: 1000)
- Configure chunk overlap (default: 200)
- Save settings
Sidebar Configuration
Customize how the sidebar handles webpage content:Using RAG Mode
By default, sidebar uses RAG with vector embeddings:- Open sidebar and click the settings icon
- Find “Copilot Chat With Website Settings”
- Ensure “Chat with website using vector embeddings” is enabled
- Configure your embedding model in Settings → RAG Settings
Using Normal Mode
For simpler, faster processing without embeddings:Disable RAG
- Open sidebar settings
- Find “Copilot Chat With Website Settings”
- Disable “Chat with website using vector embeddings”
Normal mode is limited by your model’s context window. For GPT-3.5, keep content under 4000 tokens. For GPT-4 or Claude, you can use much larger values.
Enable by Default
Automatically enable chat with webpage when opening sidebar:- Open sidebar settings
- Find “Enable Chat with Website by default (Copilot)”
- Toggle on
- The sidebar will now always start in webpage mode
RAG Configuration
Optimize RAG settings for webpage analysis:Embedding Model Selection
Choose the right embedding model:- Recommended Models
- Installation (Ollama)
For Ollama (Local):
nomic-embed-text- Best all-around, fast and accuratemxbai-embed-large- High quality embeddingsall-minilm- Lightweight and fast
text-embedding-3-small- Cost-effectivetext-embedding-3-large- Highest qualitytext-embedding-ada-002- Legacy but reliable
Chunk Settings
Optimize how content is split:| Setting | Recommended Value | Description |
|---|---|---|
| Chunk Size | 1000 | Characters per chunk |
| Chunk Overlap | 200 | Overlap between chunks |
| Retrieved Docs | 4-6 | Number of relevant chunks to use |
| Splitting Strategy | RecursiveCharacterTextSplitter | Best for web content |
Understanding Chunk Size
Understanding Chunk Size
Chunk Size determines how page content is divided:
- Smaller chunks (500-800): More precise retrieval, better for specific questions
- Larger chunks (1000-1500): More context per chunk, better for summaries
Custom RAG Prompts
Customize the system prompt for webpage analysis:- Go to Settings → RAG Settings
- Scroll to “Configure RAG Prompt”
- Select the RAG tab
- Edit the system and question prompts
- Available variables:
{context}- Retrieved webpage chunks (don’t remove){question}- User’s question
Use Cases
Research
- Summarize research papers
- Extract key findings
- Compare multiple sources
- Generate citations
Learning
- Understand complex documentation
- Get explanations in simple terms
- Generate study notes
- Create quiz questions
Shopping
- Compare product features
- Extract specifications
- Summarize reviews
- Find best deals
News
- Summarize articles
- Extract key points
- Fact-check claims
- Get different perspectives
Advanced Techniques
Combining with Internet Search
Use both webpage chat and internet search together:- Enable chat with webpage
- Enable internet search (globe icon)
- Ask questions that require both page context and external info
- Example: “How does this article’s claims compare to recent research?”
Using with Knowledge Base
Combine webpage content with your documents:- Enable chat with webpage
- Select knowledge base (database icon)
- Ask questions that cross-reference both sources
- Example: “How does this webpage’s approach compare to my notes?”
Multi-Page Analysis
Analyze multiple pages in one conversation:- Enable chat with webpage on first page
- Ask questions and get responses
- Navigate to another page (keep sidebar open)
- New page context automatically replaces old context
- Continue asking questions about the new page
Each page replaces the previous context. To compare pages, copy relevant information into your messages.
Performance Optimization
For Large Pages
For Large Pages
Use RAG Mode:
- Enable vector embeddings
- Increase chunk size to 1500
- Increase retrieved docs to 6-8
- Use a capable embedding model like
nomic-embed-text
For Speed
For Speed
Use Normal Mode:
- Disable RAG
- Set content size to 8000-10000
- Use faster models (GPT-3.5, local Ollama models)
- Limit retrieved docs to 3-4
For Accuracy
For Accuracy
Optimize RAG:
- Use high-quality embedding models
- Smaller chunk size (800)
- Higher overlap (300)
- More retrieved docs (6-8)
- Use advanced models (GPT-4, Claude)
Troubleshooting
No content extracted
No content extracted
Causes:
- Page uses JavaScript rendering
- Content behind authentication
- Page blocks content extraction
- Wait for page to fully load
- Disable RAG and try normal mode
- Try refreshing the page
- Use vision mode instead
Responses not relevant
Responses not relevant
Causes:
- Poor chunk retrieval
- Embedding model not configured
- Retrieved docs too few
- Check embedding model is set
- Increase retrieved docs count
- Adjust chunk size and overlap
- Try normal mode instead
Processing too slow
Processing too slow
Causes:
- Large page content
- Slow embedding generation
- Network latency
- Use local Ollama embedding models
- Reduce chunk size
- Limit retrieved docs
- Try normal mode for simpler pages
Context window errors
Context window errors
Causes:
- Too much content for model’s limit
- Large chunks with long conversation
- Reduce chunk size
- Reduce retrieved docs
- Use a model with larger context (GPT-4, Claude)
- Start a new chat
Privacy and Security
Data Processing: All webpage content is processed locally in your browser before being sent to your AI provider.
- Webpage content is extracted client-side
- Embeddings are generated locally or via your provider
- Only relevant chunks are sent to AI (in RAG mode)
- No data is stored on Page Assist servers (we don’t have any)
Next Steps
Vision
Analyze webpage screenshots and images
Knowledge Base
Upload documents for persistent context
Internet Search
Combine with real-time web search
Configuration Settings
Configure embedding and retrieval settings