PDF Chat

Overview

ChatbotAI-Free lets you attach PDF files directly into the conversation context. The app extracts the full text, counts tokens, and injects it into the AI’s memory — no external vector database or RAG pipeline required.

How to Use PDF Chat

Attach a PDF

Click the 📎 Attach button in the main window. Select a PDF file from your file system.

Review the confirmation dialog

A dialog appears showing:

File name and file size
Extracted text (first 500 characters preview)
Token count (using tiktoken’s cl100k_base tokenizer)
Context window stats: document_tokens / model_context_size (percentage)

The dialog shows exactly how much of your model’s context window the document will consume.

Confirm to inject

Click OK to inject the document into the conversation history. The AI receives:

I'm sharing the contents of the document 'filename.pdf' with you.
Use it as context to answer my following questions:

[full extracted text]

The AI responds:

Got it! I've read the document **filename.pdf**.
Feel free to ask me anything about it.

Ask questions

Now you can ask questions about the document’s contents. The AI has the full text in its context window and will reference it when answering.

Text Extraction

PyMuPDF (fitz)

ChatbotAI-Free uses PyMuPDF to extract text from PDFs. Extraction process:

Open the PDF file with fitz.open(pdf_path)
Iterate through all pages
Extract text from each page using page.get_text("text")
Concatenate all pages into a single string

Supported content:

Plain text
Formatted paragraphs
Tables (extracted as plain text)
Headers and footers

Scanned PDFs (images of text) are not supported. Only PDFs with embedded text layers work. If you have a scanned PDF, use OCR software (e.g., Adobe Acrobat, Tesseract) to convert it first.

Token Counting

tiktoken Tokenizer

The app uses OpenAI’s tiktoken library with the cl100k_base encoding (used by GPT-4, GPT-3.5-turbo). Why tiktoken?

Provides a close approximation of token counts for most LLMs
Ollama models (Llama, Mistral, Gemma) use similar tokenization schemes
Fast and deterministic

Token count formula:

import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")
token_count = len(encoding.encode(extracted_text))

Token counts are estimates. Different models may tokenize text slightly differently, but tiktoken provides a good ballpark figure.

Context Window Management

How It Works

When you inject a PDF:

Token counting: The extracted text is tokenized
Context calculation: The app queries your current model’s context window size (e.g., 4096, 8192, 32768, 128000)
Usage percentage: document_tokens / model_context_size × 100

Example:

Document: 12,000 tokens
Model: llama3.1:8b (128K context)
Usage: 12000 / 131072 ≈ 9.2%

Context Window Indicator

After injecting a PDF, the context donut (bottom-right) updates to show:

Green (under 50%): Plenty of room for questions
Yellow (50-80%): Approaching limit
Red (over 80%): High risk of losing earlier context

When the context window fills up, older messages are dropped by the LLM. Your PDF will eventually be evicted if you ask too many follow-up questions.

Model Context Sizes

The app automatically queries your model’s context size using ollama.show(model_name). Common sizes:

Llama 3.1 (8B): 128K tokens (131,072)
Mistral 7B: 8K tokens (8,192)
Gemma 2: 8K tokens (8,192)
DeepSeek R1: 64K tokens (65,536)

You can override the context size in Settings by setting a custom num_ctx value. This takes priority over the model’s default.

Workflow Example

Attaching a Research Paper

Example: Analyzing a 20-page PDF

Scenario: You have a 20-page research paper (PDF) and want to ask questions about the methodology.Steps:

Attach PDF
- Click 📎 Attach
- Select research_paper.pdf (20 pages, 45,000 characters)
Review confirmation
- File name: research_paper.pdf
- Tokens: ~11,250 (tiktoken estimate)
- Model: llama3.1:8b (128K context)
- Usage: 11250 / 131072 = 8.6%
Inject
- Click OK
- AI confirms: “Got it! I’ve read the document research_paper.pdf. Feel free to ask me anything about it.”
Ask questions
- “What methodology did the authors use?”
- “Summarize the key findings in 3 bullet points”
- “What are the limitations mentioned in the discussion section?”
Monitor context
- Context donut shows 18% after 3 questions (conversation history + document)
- Still plenty of room for more questions

Limitations

Token Budget

Problem: Large documents consume significant context. Solution:

Use models with large context windows (Llama 3.1 128K, GPT-4 Turbo 128K)
Split large documents into smaller sections
Start a new chat if context fills up (the document will be lost, but you can re-attach it)

Text-Only Extraction

Not supported:

Images, charts, diagrams
Scanned PDFs (no embedded text)
Complex layouts (multi-column, rotated text)

Workaround: Use OCR software to convert scanned PDFs to text-based PDFs first.

No Persistence

Behavior: PDF contents are only stored in the current conversation. Implication:

If you start a new chat, the PDF is not carried over
Chat history is saved as Markdown files, but the PDF text is embedded as a single message
Re-opening a saved chat does restore the PDF context (it’s in the conversation history)

Tips for Best Results

Large Documents
Multiple Documents
Follow-Up Questions

For PDFs >100 pages:

Use a model with 128K+ context (e.g., Llama 3.1)
Ask specific questions to avoid forcing the AI to reprocess the entire document
Consider splitting the PDF into chapters or sections

Technical Details

Implementation

The PDF injection logic is in ai_manager.py:516-537:

def inject_document_context(self, filename: str, text: str):
    """Inject the full text of a document into conversation history."""
    self.conversation_history.append({
        "role": "user",
        "content": (
            f"I'm sharing the contents of the document '{filename}' with you. "
            f"Use it as context to answer my following questions:\n\n{text}"
        ),
    })
    # Add a brief assistant ack
    self.conversation_history.append({
        "role": "assistant",
        "content": (
            f"Got it! I've read the document **{filename}**. "
            "Feel free to ask me anything about it."
        ),
    })

Why this works:

The document text is treated as a user message
The AI’s acknowledgment is a assistant message
Subsequent questions reference the injected context via the conversation history
No external vector DB or embeddings required

This approach is simple but limited by context window size. For massive document collections, consider using a RAG pipeline with vector embeddings (not currently supported).

Get Started

Core Features

Configuration

Advanced

Overview

How to Use PDF Chat

Text Extraction

PyMuPDF (fitz)

Token Counting

tiktoken Tokenizer

Context Window Management

How It Works

Context Window Indicator

Model Context Sizes

Workflow Example

Attaching a Research Paper

Limitations

Token Budget

Text-Only Extraction

No Persistence

Tips for Best Results

Technical Details

Implementation

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Advanced

​Overview

​How to Use PDF Chat

​Text Extraction

​PyMuPDF (fitz)

​Token Counting

​tiktoken Tokenizer

​Context Window Management

​How It Works

​Context Window Indicator

​Model Context Sizes

​Workflow Example

​Attaching a Research Paper

​Limitations

​Token Budget

​Text-Only Extraction

​No Persistence

​Tips for Best Results

​Technical Details

​Implementation

Build docs developers (and LLMs) love

Overview

How to Use PDF Chat

Text Extraction

PyMuPDF (fitz)

Token Counting

tiktoken Tokenizer

Context Window Management

How It Works

Context Window Indicator

Model Context Sizes

Workflow Example

Attaching a Research Paper

Limitations

Token Budget

Text-Only Extraction

No Persistence

Tips for Best Results

Technical Details

Implementation