Overview
ChatbotAI-Free lets you attach PDF files directly into the conversation context. The app extracts the full text, counts tokens, and injects it into the AI’s memory — no external vector database or RAG pipeline required.How to Use PDF Chat
Review the confirmation dialog
A dialog appears showing:
- File name and file size
- Extracted text (first 500 characters preview)
- Token count (using tiktoken’s
cl100k_basetokenizer) - Context window stats:
document_tokens / model_context_size (percentage)
The dialog shows exactly how much of your model’s context window the document will consume.
Confirm to inject
Click OK to inject the document into the conversation history. The AI receives:The AI responds:
Text Extraction
PyMuPDF (fitz)
ChatbotAI-Free uses PyMuPDF to extract text from PDFs. Extraction process:- Open the PDF file with
fitz.open(pdf_path) - Iterate through all pages
- Extract text from each page using
page.get_text("text") - Concatenate all pages into a single string
- Plain text
- Formatted paragraphs
- Tables (extracted as plain text)
- Headers and footers
Token Counting
tiktoken Tokenizer
The app uses OpenAI’s tiktoken library with thecl100k_base encoding (used by GPT-4, GPT-3.5-turbo).
Why tiktoken?
- Provides a close approximation of token counts for most LLMs
- Ollama models (Llama, Mistral, Gemma) use similar tokenization schemes
- Fast and deterministic
Token counts are estimates. Different models may tokenize text slightly differently, but tiktoken provides a good ballpark figure.
Context Window Management
How It Works
When you inject a PDF:- Token counting: The extracted text is tokenized
- Context calculation: The app queries your current model’s context window size (e.g., 4096, 8192, 32768, 128000)
- Usage percentage:
document_tokens / model_context_size × 100
- Document: 12,000 tokens
- Model:
llama3.1:8b(128K context) - Usage:
12000 / 131072 ≈ 9.2%
Context Window Indicator
After injecting a PDF, the context donut (bottom-right) updates to show:- Green (under 50%): Plenty of room for questions
- Yellow (50-80%): Approaching limit
- Red (over 80%): High risk of losing earlier context
When the context window fills up, older messages are dropped by the LLM. Your PDF will eventually be evicted if you ask too many follow-up questions.
Model Context Sizes
The app automatically queries your model’s context size usingollama.show(model_name).
Common sizes:
- Llama 3.1 (8B): 128K tokens (131,072)
- Mistral 7B: 8K tokens (8,192)
- Gemma 2: 8K tokens (8,192)
- DeepSeek R1: 64K tokens (65,536)
You can override the context size in Settings by setting a custom
num_ctx value. This takes priority over the model’s default.Workflow Example
Attaching a Research Paper
Example: Analyzing a 20-page PDF
Example: Analyzing a 20-page PDF
Scenario: You have a 20-page research paper (PDF) and want to ask questions about the methodology.Steps:
- Attach PDF
- Click 📎 Attach
- Select
research_paper.pdf(20 pages, 45,000 characters)
- Review confirmation
- File name:
research_paper.pdf - Tokens: ~11,250 (tiktoken estimate)
- Model:
llama3.1:8b(128K context) - Usage:
11250 / 131072 = 8.6%
- File name:
- Inject
- Click OK
- AI confirms: “Got it! I’ve read the document research_paper.pdf. Feel free to ask me anything about it.”
- Ask questions
- “What methodology did the authors use?”
- “Summarize the key findings in 3 bullet points”
- “What are the limitations mentioned in the discussion section?”
- Monitor context
- Context donut shows 18% after 3 questions (conversation history + document)
- Still plenty of room for more questions
Limitations
Token Budget
Problem: Large documents consume significant context. Solution:- Use models with large context windows (Llama 3.1 128K, GPT-4 Turbo 128K)
- Split large documents into smaller sections
- Start a new chat if context fills up (the document will be lost, but you can re-attach it)
Text-Only Extraction
Not supported:- Images, charts, diagrams
- Scanned PDFs (no embedded text)
- Complex layouts (multi-column, rotated text)
No Persistence
Behavior: PDF contents are only stored in the current conversation. Implication:- If you start a new chat, the PDF is not carried over
- Chat history is saved as Markdown files, but the PDF text is embedded as a single message
- Re-opening a saved chat does restore the PDF context (it’s in the conversation history)
Tips for Best Results
- Large Documents
- Multiple Documents
- Follow-Up Questions
For PDFs >100 pages:
- Use a model with 128K+ context (e.g., Llama 3.1)
- Ask specific questions to avoid forcing the AI to reprocess the entire document
- Consider splitting the PDF into chapters or sections
Technical Details
Implementation
The PDF injection logic is inai_manager.py:516-537:
- The document text is treated as a user message
- The AI’s acknowledgment is a assistant message
- Subsequent questions reference the injected context via the conversation history
- No external vector DB or embeddings required
This approach is simple but limited by context window size. For massive document collections, consider using a RAG pipeline with vector embeddings (not currently supported).