Overview
MedMitra uses LlamaParse for:- Lab Report Parsing: Extract text from PDF medical reports
- Structure Preservation: Maintain tables, headers, and formatting
- Multi-page Processing: Handle complex medical documents
- Markdown Output: Clean, structured text output
Why LlamaParse?
- Medical Document Optimized: Better than generic PDF parsers
- Table Extraction: Preserves lab values in table format
- Fast Processing: Async processing for multiple pages
- Reliable: Handles various PDF formats and layouts
Prerequisites
- A LlamaCloud account (sign up at cloud.llamaindex.ai)
- Python 3.9+ for backend integration
Setup Instructions
1. Get a LlamaParse API Key
- Visit cloud.llamaindex.ai
- Sign up or log in to your account
- Navigate to API Keys section
- Click Generate API Key
- Copy your API key
LlamaParse offers a free tier with limited credits. Monitor your usage in the dashboard.
2. Configure Environment Variables
Add tobackend/.env:
3. Install LlamaParse SDK
The LlamaParse SDK is included in project dependencies:Implementation
Parser Configuration
Location:backend/parsers/parse.py
Configuration Options
Your LlamaParse API key from LlamaCloud
Number of parallel workers for processing pages
Enable detailed logging for debugging
Language of the documents (ISO 639-1 code)
Output format:
markdown, text, or jsonAsync PDF Processing
Usage in MedMitra
Document Upload Flow
- User uploads PDF → Saved to Supabase Storage
- Get file URL → Public URL from Supabase
- Parse PDF → LlamaParse extracts text
- Store results → Text saved to database
- AI Analysis → Groq processes extracted text
Example: Processing Lab Report
Output Format
Markdown Output
LlamaParse returns structured markdown that preserves:Advantages of Markdown Format
- Structured: Headers, tables, and lists preserved
- Readable: Easy to display in UI or process with AI
- Parseable: Can be further processed for specific data
- Convertible: Easy to convert to HTML or other formats
Integration with AI Workflow
Extracted text is used in the Medical Insights Agent:Best Practices
File Handling
File Handling
- Upload PDFs to Supabase Storage first
- Use public URLs for LlamaParse access
- Store extracted text in database for caching
- Keep original PDFs for reference
Error Handling
Error Handling
- Always check parse status before using results
- Implement retry logic for failed parses
- Log parsing errors for debugging
- Have fallback for unparseable documents
Performance
Performance
- Use async processing for multiple files
- Set appropriate num_workers for your use case
- Cache extracted text to avoid re-parsing
- Monitor LlamaParse usage and credits
Data Quality
Data Quality
- Validate extracted text structure
- Check for missing tables or sections
- Handle multi-page reports properly
- Preserve page boundaries for context
Supported PDF Types
LlamaParse works well with:- Lab Reports: CBC, Chemistry panels, Lipid panels
- Pathology Reports: Biopsy results, Cytology
- Radiology Reports: Text-based findings and impressions
- Medical Records: Patient history, Discharge summaries
- Test Results: Any structured medical test data
Cost & Usage Limits
Free Tier
- 1,000 pages/month free
- No credit card required
- Basic support
Paid Tiers
- Pay-as-you-go: $0.003 per page
- Volume discounts available
- Priority support
Troubleshooting
API Key Issues
API Key Issues
Error: Invalid API key
- Verify LLAMAPARSE_API_KEY in .env file
- Check key hasn’t expired
- Ensure no extra spaces in key
- Regenerate key from dashboard
Parsing Failures
Parsing Failures
Error: Failed to parse PDF
- Check PDF is not corrupted
- Verify file URL is accessible
- Ensure PDF is not password-protected
- Check file size is within limits
- Try with verbose=True for debugging
Empty Results
Empty Results
Error: Extracted text is empty
- PDF may be image-based (needs OCR)
- Check if PDF has extractable text
- Verify PDF is not encrypted
- Try different result_type
Rate Limiting
Rate Limiting
Error: Rate limit exceeded
- Reduce num_workers
- Add delays between requests
- Upgrade to paid tier
- Implement request queuing
Advanced Configuration
Custom Parser Settings
Batch Processing
Alternative Parsing Options
If LlamaParse doesn’t meet your needs:- PyMuPDF: Free, but less intelligent structure extraction
- PyPDF2: Simple, but limited table support
- Tabula: Good for tables, but requires Java
- AWS Textract: Enterprise option with OCR
Next Steps
Gladia Integration
Set up speech-to-text for medical dictation
Medical AI Agents
Learn how parsed data is used by AI agents
