Overview
RAG Chat allows you to upload PDF documents that will be processed, chunked, and stored in a vector database for intelligent question answering. This guide walks you through the upload process and best practices.Upload Interface
The file upload interface is located in the sidebar of the RAG Chat application:app.py
You can upload multiple PDF files at once. All files will be processed and added to the same vector store.
How Document Processing Works
Text Chunking
The extracted text is split into manageable chunks for better retrieval:
app.py
- chunk_size: 1000 characters per chunk
- chunk_overlap: 400 characters overlap between chunks to preserve context
Processing Flow
When you upload files, the system processes them automatically:app.py
Best Practices
Tips for Best Results
- Multiple Related Documents: Upload documents on related topics together for better context understanding
- Clean PDFs: Use PDFs with clear text extraction (not scanned images)
- Reasonable Size: While there’s no strict limit, extremely large documents may take longer to process
- Incremental Uploads: You can upload additional documents at any time - they’ll be added to the existing vector store
Persistent Storage
Documents are stored persistently in thedb directory:
app.py
Once uploaded, documents persist across sessions. You don’t need to re-upload them every time you restart the application.
Troubleshooting
Upload Not Working
- Ensure the file is a valid PDF
- Check that the PDF is not password-protected
- Verify you have sufficient disk space
Slow Processing
- Large PDFs take longer to process
- The spinner shows “Carregando arquivos…” during processing
- Wait for the process to complete before asking questions
Next Steps
Asking Questions
Learn how to ask questions about your uploaded documents
Configuration
Customize chunk size and other processing parameters