Upload RFP Document
Unique identifier for the proposal
PDF file to upload (max 10MB)
Indicates if the upload was successful
Success or error message
Name of the uploaded file
S3 object key where the file is stored
File size in bytes
File Validation
- Supported formats: PDF only
- Maximum size: 10MB
- Validation checks:
- File must not be empty
- Must have valid PDF header (
%PDF) - File extension must be
.pdf
Storage Details
S3 Path Structure:proposal-id: Proposal UUIDuploaded-by: User ID who uploaded the fileoriginal-size: File size in bytes
uploaded_files.rfp-document array with the filename.
Vectorization
RFP documents are not automatically vectorized during upload. Vectorization occurs during the “Analyze & Continue” step to optimize upload performance.Upload Concept Document
Unique identifier for the proposal
PDF, DOC, or DOCX file (max 10MB)
Indicates if the upload was successful
Success message
Name of the uploaded file
S3 object key where the file is stored
File size in bytes
File Validation
- Supported formats: PDF, DOC, DOCX
- Maximum size: 10MB
- No vectorization: Concept documents are stored but not vectorized
Upload Reference Proposal
Unique identifier for the proposal
PDF or DOCX file (max 5MB)
Donor organization name (e.g., “USAID”, “World Bank”)
Sector category (e.g., “Health”, “Education”)
Year of the reference proposal (e.g., “2023”)
Proposal status (e.g., “Funded”, “Rejected”)
Always true for successful uploads
Status message indicating vectorization in progress
Name of the uploaded file
S3 object key
File size in bytes
Initial status:
"pending"Async Vectorization Process
-
Upload Phase (Fast):
- Validates file format and size
- Uploads to S3 with metadata
- Updates DynamoDB with
vectorization_status: pending - Returns immediately
-
Vectorization Phase (Async):
- Triggers Lambda worker with
InvocationType: Event - Lambda extracts text using PyPDF2 or python-docx
- Chunks text (1000 chars, 200 char overlap)
- Generates embeddings using Amazon Titan Embed v2
- Stores vectors in S3 Vectors
reference-proposals-index - Updates
vectorization_statustocompletedorfailed
- Triggers Lambda worker with
-
Client Polling:
- Poll
/api/proposals/{proposal_id}/documents/vectorization-status - Check for
all_completed: trueor individual file status
- Poll
Vector Storage
Index:reference-proposals-index
Vector Key Format (Metadata Encoded):
amazon.titan-embed-text-v2:0 (1024 dimensions)
S3 Metadata:
proposal-iduploaded-byoriginal-sizedonor,sector,year,status
Upload Supporting Document
Unique identifier for the proposal
PDF or DOCX file (max 5MB)
Organization name
Type of project (e.g., “Infrastructure”, “Capacity Building”)
Geographic region (e.g., “East Africa”, “Horn of Africa”)
Indicates successful upload
Status message
Uploaded filename
S3 key
File size in bytes
Initial status:
"pending"Metadata echoed back
Vector Storage
Index:existing-work-index
Vector Key Format:
Save Concept Text
Unique identifier for the proposal
Plain text concept (minimum 50 characters)
- S3:
{proposal_code}/documents/initial_concept/concept_text.txt - DynamoDB:
text_inputs.initial-concept
Save Work Text
Unique identifier for the proposal
Plain text existing work (minimum 50 characters)
Organization name
Project type
Geographic region
Upload success indicator
Success message
Character count
Whether vectorization succeeded
Text Chunking & Vectorization
- Chunking: Text is split into 1000-character chunks with 200-character overlap
- Vector Storage: Each chunk is stored separately in
existing-work-index - Key Format:
{proposal_id}-work-chunk-{idx}|{org}|{type}|{region}|existing_work_text|{idx}|{total} - Synchronous: Unlike file uploads, text vectorization happens immediately
Check Vectorization Status
Get All File Status
Unique identifier for the proposal
Request success indicator
Map of filenames to status objects
Whether all files are vectorized
Whether any files are still processing
Whether any vectorizations failed
Status Values
| Status | Description |
|---|---|
pending | Waiting to start |
processing | Currently vectorizing |
completed | Successfully vectorized |
failed | Vectorization failed |
Get Single File Status
Proposal ID
Filename to check
Error Handling
Common Error Codes
| Code | Error | Solution |
|---|---|---|
400 | Invalid file type | Use PDF/DOCX only |
400 | File too large | Max 10MB (RFP/concept) or 5MB (reference/supporting) |
400 | Empty file | Ensure file has content |
400 | Invalid PDF format | File must start with %PDF |
400 | Text too short | Minimum 50 characters for text inputs |
404 | Proposal not found | Check proposal ID and ownership |
500 | Upload verification failed | File size mismatch after upload |
500 | S3 bucket not configured | Contact system administrator |
Validation Examples
Best Practices
1. File Size Optimization
- Compress PDFs before upload
- Remove unnecessary images from documents
- Use PDF/A format for better text extraction
2. Polling for Vectorization
Polling Example
3. Metadata Best Practices
-
Reference Proposals:
- Always include
donorfor better retrieval - Use consistent
sectornaming - Include
yearfor temporal filtering
- Always include
-
Supporting Documents:
- Use
organizationto identify source - Standardize
project_typevalues - Include
regionfor geographic filtering
- Use
4. Error Recovery
- Failed vectorizations can be retried by re-uploading
- Check vectorization status before proceeding to analysis
- Use
has_failedflag to detect issues early
5. Content Type Headers
- Always use
multipart/form-datafor file uploads - Don’t manually set
Content-Typewhen using FormData in browsers - Python/cURL: Let the library set boundaries automatically