The Document Management system handles all file operations for the proposal workflow, including upload, storage in S3, vectorization for AI retrieval, and deletion with cleanup.
Supported document types
The platform supports different document types for various purposes:
Document Type Formats Max Size Vectorized Purpose RFP Document PDF 10 MB No The Request for Proposal to respond to Concept Document PDF, DOC, DOCX 10 MB No Your initial project concept Concept Text Plain text - Yes Text-based concept input Reference Proposals PDF, DOCX 5 MB Yes Previously successful proposals Supporting Documents PDF, DOCX 5 MB Yes Additional context materials Work Text Plain text - Yes Existing work descriptions
Upload workflow
Upload RFP document
curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/upload \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "[email protected] "
The RFP document is:
Validated for PDF format (%PDF header check)
Stored in S3 at {proposal_code}/documents/rfp-document/{filename}
Metadata updated in DynamoDB
Text extracted for analysis (not vectorized)
Upload concept document
curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/upload-concept-file \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "[email protected] "
Concept documents support PDF, DOC, and DOCX formats.
Save concept text
For text-based concepts:
curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/save-concept-text \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"text": "Climate-smart agriculture project focusing on..."}'
Concept text is immediately vectorized for AI retrieval. Minimum length: 50 characters.
Upload reference proposals
curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/upload-reference-file \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "[email protected] "
Reference proposals are:
Stored in S3
Asynchronously vectorized by Lambda worker
Indexed in reference-proposals-index for AI retrieval
Vectorization process
How vectorization works
Documents marked for vectorization undergo this process:
Lambda worker triggered
When a file is uploaded, a Lambda worker is invoked asynchronously to handle vectorization.
Text extraction
Text is extracted from PDF/DOCX using PyPDF2 or python-docx libraries.
Chunking
Text is split into chunks (1000 characters with 200-character overlap) to maintain context.
Embedding generation
Each chunk is embedded using Amazon Titan Embed v2 (1024 dimensions).
Vector storage
Vectors are stored in S3 Express One Zone with metadata:
proposal_id
document_type
chunk_index
source_filename
Check vectorization status
curl -X GET https://api.igad-hub.com/api/proposals/{proposal_id}/documents/vectorization-status \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"reference_proposals" : [
{
"filename" : "reference-proposal-2023.pdf" ,
"status" : "completed" ,
"vector_count" : 45 ,
"completed_at" : "2024-01-15T10:30:00Z"
},
{
"filename" : "reference-proposal-2024.pdf" ,
"status" : "processing" ,
"started_at" : "2024-01-15T10:35:00Z"
}
],
"supporting_documents" : [ ... ]
}
Vector indexes
The platform maintains separate vector indexes:
reference-proposals-index : Vectors from reference proposals
existing-work-index : Vectors from supporting documents and work text
concept-index : Vectors from concept text
These indexes are queried during AI generation to retrieve relevant context.
Storage architecture
S3 bucket structure
igad-{environment}-documents/
└── {proposal_code}/
└── documents/
├── rfp-document/
│ └── {filename}.pdf
├── concept-document/
│ └── {filename}.docx
├── concept-text/
│ └── concept-text.txt
├── reference-proposals/
│ ├── reference-1.pdf
│ └── reference-2.pdf
└── supporting-files/
├── supporting-1.pdf
└── supporting-2.docx
S3 vectors bucket
igad-{environment}-vectors/
└── {proposal_code}/
├── reference-proposals-index/
│ └── {filename}/
│ ├── chunk-0
│ ├── chunk-1
│ └── ...
└── existing-work-index/
└── {filename}/
├── chunk-0
└── ...
DynamoDB storage
Proposal documents are tracked in DynamoDB:
{
"PK" : "PROPOSAL#PROP-20240115-A1B2" ,
"SK" : "METADATA" ,
"uploaded_files" : {
"rfp-document" : [ "rfp-2024.pdf" ],
"concept-document" : [ "concept.docx" ],
"reference-proposals" : [
"reference-proposal-2023.pdf" ,
"reference-proposal-2024.pdf"
],
"supporting-files" : [ "existing-work.pdf" ]
},
"text_inputs" : {
"initial-concept" : "Climate-smart agriculture project..." ,
"existing-work" : "Our organization has implemented..."
}
}
Document deletion
Delete uploaded files
curl -X DELETE https://api.igad-hub.com/api/proposals/{proposal_id}/documents/rfp-document/{filename} \
-H "Authorization: Bearer YOUR_TOKEN"
Deletion process:
S3 file deletion : File removed from documents bucket
Vector cleanup : All vectors for this file removed from vectors bucket
DynamoDB update : File removed from uploaded_files list
Analysis cleanup : Related analysis data cleared (for RFP documents)
Deleting an RFP document also clears rfp_analysis data and resets workflow progress.
Delete concept text
curl -X DELETE https://api.igad-hub.com/api/proposals/{proposal_id}/documents/concept-text \
-H "Authorization: Bearer YOUR_TOKEN"
Clears concept text from both DynamoDB and vector indexes.
Delete entire proposal
Deleting a proposal performs comprehensive cleanup:
curl -X DELETE https://api.igad-hub.com/api/proposals/{proposal_id} \
-H "Authorization: Bearer YOUR_TOKEN"
Cleanup steps:
Delete vectors
All vectors in reference-proposals-index and existing-work-index for this proposal are removed.
Delete S3 files
All files under {proposal_code}/ are deleted from the documents bucket.
Delete DynamoDB metadata
The proposal metadata record is removed from the table.
Response:
{
"message" : "Proposal deleted successfully" ,
"proposal_code" : "PROP-20240115-A1B2" ,
"cleanup_summary" : {
"vectors_deleted" : "attempted" ,
"s3_files_deleted" : "attempted" ,
"dynamodb_deleted" : "completed"
}
}
Vector and S3 deletion are non-critical. If they fail, DynamoDB deletion still proceeds.
List documents
List all documents
curl -X GET https://api.igad-hub.com/api/proposals/{proposal_id}/documents \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"documents" : [
{
"type" : "rfp-document" ,
"filename" : "rfp-2024.pdf" ,
"uploaded_at" : "2024-01-15T09:00:00Z" ,
"size_bytes" : 2457600
},
{
"type" : "reference-proposals" ,
"filename" : "reference-proposal.pdf" ,
"uploaded_at" : "2024-01-15T09:30:00Z" ,
"size_bytes" : 1024000 ,
"vectorization_status" : "completed"
}
]
}
File validation
PDF validation
PDF files are validated by checking the header:
# From source code
if not file_content.startswith( b "%PDF" ):
raise HTTPException(
status_code = 400 ,
detail = "Invalid PDF file. File must start with %PDF header."
)
Size limits
RFP/Concept documents : 10 MB
Reference/Supporting documents : 5 MB
Text inputs : No hard limit, but minimum 50 characters for concept text
Supported MIME types
application/pdf
application/msword (DOC)
application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)
Next steps
Upload API Complete API reference for document uploads
Proposal workflow Learn how documents fit into the proposal workflow
Architecture Understand S3, Lambda, and vector storage architecture
Delete API Document deletion and cleanup API