Upload Documents

Upload RFP Document

proposal_id

string

required

Unique identifier for the proposal

file

UploadFile

required

PDF file to upload (max 10MB)

success

boolean

Indicates if the upload was successful

message

string

Success or error message

filename

string

Name of the uploaded file

document_key

string

S3 object key where the file is stored

size

number

File size in bytes

curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@rfp_document.pdf"

{
  "success": true,
  "message": "PDF uploaded successfully",
  "filename": "rfp_document.pdf",
  "document_key": "PROP-2024-001/documents/rfp/rfp_document.pdf",
  "size": 2458624
}

File Validation

Supported formats: PDF only
Maximum size: 10MB
Validation checks:
- File must not be empty
- Must have valid PDF header (%PDF)
- File extension must be .pdf

Storage Details

S3 Path Structure:

{proposal_code}/documents/rfp/{filename}

S3 Metadata:

proposal-id: Proposal UUID
uploaded-by: User ID who uploaded the file
original-size: File size in bytes

DynamoDB Update: Updates uploaded_files.rfp-document array with the filename.

Vectorization

RFP documents are not automatically vectorized during upload. Vectorization occurs during the “Analyze & Continue” step to optimize upload performance.

Upload Concept Document

proposal_id

string

required

Unique identifier for the proposal

file

UploadFile

required

PDF, DOC, or DOCX file (max 10MB)

success

boolean

Indicates if the upload was successful

message

string

Success message

filename

string

Name of the uploaded file

document_key

string

S3 object key where the file is stored

size

number

File size in bytes

curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload-concept-file" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]"

{
  "success": true,
  "message": "Concept file uploaded successfully",
  "filename": "concept.docx",
  "document_key": "PROP-2024-001/documents/initial_concept/concept.docx",
  "size": 45120
}

File Validation

Supported formats: PDF, DOC, DOCX
Maximum size: 10MB
No vectorization: Concept documents are stored but not vectorized

S3 Path:

{proposal_code}/documents/initial_concept/{filename}

Upload Reference Proposal

proposal_id

string

required

Unique identifier for the proposal

file

UploadFile

required

PDF or DOCX file (max 5MB)

donor

string

Donor organization name (e.g., “USAID”, “World Bank”)

sector

string

Sector category (e.g., “Health”, “Education”)

year

string

Year of the reference proposal (e.g., “2023”)

status

string

Proposal status (e.g., “Funded”, “Rejected”)

success

boolean

Always true for successful uploads

message

string

Status message indicating vectorization in progress

filename

string

Name of the uploaded file

document_key

string

S3 object key

size

number

File size in bytes

vectorization_status

string

Initial status: "pending"

curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload-reference-file" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]" \
  -F "donor=USAID" \
  -F "sector=Health" \
  -F "year=2023" \
  -F "status=Funded"

{
  "success": true,
  "message": "Reference proposal uploaded, vectorization in progress",
  "filename": "reference.pdf",
  "document_key": "PROP-2024-001/documents/references/reference.pdf",
  "size": 1234567,
  "vectorization_status": "pending"
}

Async Vectorization Process

Upload Phase (Fast):
- Validates file format and size
- Uploads to S3 with metadata
- Updates DynamoDB with vectorization_status: pending
- Returns immediately
Vectorization Phase (Async):
- Triggers Lambda worker with InvocationType: Event
- Lambda extracts text using PyPDF2 or python-docx
- Chunks text (1000 chars, 200 char overlap)
- Generates embeddings using Amazon Titan Embed v2
- Stores vectors in S3 Vectors reference-proposals-index
- Updates vectorization_status to completed or failed
Client Polling:
- Poll /api/proposals/{proposal_id}/documents/vectorization-status
- Check for all_completed: true or individual file status

Vector Storage

Index: reference-proposals-index Vector Key Format (Metadata Encoded):

{proposal_id}|{donor}|{sector}|{year}|{document_name}|{chunk_index}|{total_chunks}

Example:

PROP-2024-001-chunk-0|USAID|Health|2023|reference.pdf|0|15

Embedding Model: amazon.titan-embed-text-v2:0 (1024 dimensions) S3 Metadata:

proposal-id
uploaded-by
original-size
donor, sector, year, status

Upload Supporting Document

proposal_id

string

required

Unique identifier for the proposal

file

UploadFile

required

PDF or DOCX file (max 5MB)

organization

string

Organization name

project_type

string

Type of project (e.g., “Infrastructure”, “Capacity Building”)

region

string

Geographic region (e.g., “East Africa”, “Horn of Africa”)

success

boolean

Indicates successful upload

message

string

Status message

filename

string

Uploaded filename

document_key

string

S3 key

size

number

File size in bytes

vectorization_status

string

Initial status: "pending"

metadata

object

Metadata echoed back

curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload-supporting-file" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@existing_work.pdf" \
  -F "organization=IGAD" \
  -F "project_type=Capacity Building" \
  -F "region=East Africa"

{
  "success": true,
  "message": "Supporting document uploaded, vectorization in progress",
  "filename": "existing_work.pdf",
  "document_key": "PROP-2024-001/documents/supporting/existing_work.pdf",
  "size": 891234,
  "vectorization_status": "pending",
  "metadata": {
    "organization": "IGAD",
    "project_type": "Capacity Building",
    "region": "East Africa"
  }
}

Vector Storage

Index: existing-work-index Vector Key Format:

{proposal_id}|{organization}|{project_type}|{region}|{document_name}|{chunk_index}|{total_chunks}

S3 Path:

{proposal_code}/documents/supporting/{filename}

Save Concept Text

proposal_id

string

required

Unique identifier for the proposal

concept_text

string

required

Plain text concept (minimum 50 characters)

{
  "success": true,
  "message": "Concept text saved successfully",
  "text_length": 1024
}

Storage:

S3: {proposal_code}/documents/initial_concept/concept_text.txt
DynamoDB: text_inputs.initial-concept

Save Work Text

proposal_id

string

required

Unique identifier for the proposal

work_text

string

required

Plain text existing work (minimum 50 characters)

organization

string

Organization name

project_type

string

Project type

region

string

Geographic region

success

boolean

Upload success indicator

message

string

Success message

text_length

number

Character count

vectorized

boolean

Whether vectorization succeeded

curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/save-work-text" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "work_text=Our organization has implemented 15 water sanitation projects..." \
  -F "organization=IGAD" \
  -F "project_type=WASH" \
  -F "region=East Africa"

{
  "success": true,
  "message": "Work text saved successfully",
  "text_length": 2048,
  "vectorized": true
}

Text Chunking & Vectorization

Chunking: Text is split into 1000-character chunks with 200-character overlap
Vector Storage: Each chunk is stored separately in existing-work-index
Key Format: {proposal_id}-work-chunk-{idx}|{org}|{type}|{region}|existing_work_text|{idx}|{total}
Synchronous: Unlike file uploads, text vectorization happens immediately

Check Vectorization Status

Get All File Status

proposal_id

string

required

Unique identifier for the proposal

success

boolean

Request success indicator

vectorization_status

object

Map of filenames to status objects

all_completed

boolean

Whether all files are vectorized

has_pending

boolean

Whether any files are still processing

has_failed

boolean

Whether any vectorizations failed

curl -X GET "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/vectorization-status" \
  -H "Authorization: Bearer YOUR_TOKEN"

{
  "success": true,
  "vectorization_status": {
    "reference1.pdf": {
      "status": "completed",
      "started_at": "2024-03-15T10:30:00Z",
      "completed_at": "2024-03-15T10:32:15Z",
      "chunks_processed": 12,
      "total_chunks": 12
    },
    "reference2.pdf": {
      "status": "processing",
      "started_at": "2024-03-15T10:35:00Z",
      "chunks_processed": 5,
      "total_chunks": 18
    },
    "reference3.pdf": {
      "status": "failed",
      "started_at": "2024-03-15T10:40:00Z",
      "error": "Failed to extract text from PDF"
    }
  },
  "all_completed": false,
  "has_pending": true,
  "has_failed": true
}

Status Values

Status	Description
`pending`	Waiting to start
`processing`	Currently vectorizing
`completed`	Successfully vectorized
`failed`	Vectorization failed

Get Single File Status

proposal_id

string

required

Proposal ID

filename

string

required

Filename to check

curl -X GET "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/vectorization-status/{filename}" \
  -H "Authorization: Bearer YOUR_TOKEN"

{
  "success": true,
  "filename": "reference1.pdf",
  "status": "completed",
  "chunks_processed": 12,
  "total_chunks": 12,
  "started_at": "2024-03-15T10:30:00Z",
  "completed_at": "2024-03-15T10:32:15Z"
}

Error Handling

Common Error Codes

Code	Error	Solution
`400`	Invalid file type	Use PDF/DOCX only
`400`	File too large	Max 10MB (RFP/concept) or 5MB (reference/supporting)
`400`	Empty file	Ensure file has content
`400`	Invalid PDF format	File must start with `%PDF`
`400`	Text too short	Minimum 50 characters for text inputs
`404`	Proposal not found	Check proposal ID and ownership
`500`	Upload verification failed	File size mismatch after upload
`500`	S3 bucket not configured	Contact system administrator

Validation Examples

try:
    response = requests.post(url, headers=headers, files=files)
    response.raise_for_status()
    result = response.json()
    print(f"Uploaded: {result['filename']}")
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 400:
        print(f"Validation error: {e.response.json()['detail']}")
    else:
        print(f"Upload failed: {e}")

Best Practices

1. File Size Optimization

Compress PDFs before upload
Remove unnecessary images from documents
Use PDF/A format for better text extraction

2. Polling for Vectorization

Polling Example

const pollVectorizationStatus = async (proposalId: string): Promise<void> => {
  const maxAttempts = 60 // 5 minutes
  let attempts = 0

  while (attempts < maxAttempts) {
    const response = await fetch(
      `${API_URL}/api/proposals/${proposalId}/documents/vectorization-status`
    )
    const { all_completed, has_failed } = await response.json()

    if (all_completed) {
      console.log('All documents vectorized successfully')
      return
    }

    if (has_failed) {
      console.error('Some documents failed to vectorize')
      return
    }

    await new Promise(resolve => setTimeout(resolve, 5000)) // Wait 5 seconds
    attempts++
  }

  console.warn('Vectorization timeout - still processing')
}

3. Metadata Best Practices

Reference Proposals:
- Always include donor for better retrieval
- Use consistent sector naming
- Include year for temporal filtering
Supporting Documents:
- Use organization to identify source
- Standardize project_type values
- Include region for geographic filtering

4. Error Recovery

Failed vectorizations can be retried by re-uploading
Check vectorization status before proceeding to analysis
Use has_failed flag to detect issues early

5. Content Type Headers

Always use multipart/form-data for file uploads
Don’t manually set Content-Type when using FormData in browsers
Python/cURL: Let the library set boundaries automatically

Proposals

Proposal Workflow

Documents

Admin

Authentication

Upload RFP Document

File Validation

Storage Details

Vectorization

Upload Concept Document

File Validation

Upload Reference Proposal

Async Vectorization Process

Vector Storage

Upload Supporting Document

Vector Storage

Save Concept Text

Save Work Text

Text Chunking & Vectorization

Check Vectorization Status

Get All File Status

Status Values

Get Single File Status

Error Handling

Common Error Codes

Validation Examples

Best Practices

1. File Size Optimization

2. Polling for Vectorization

3. Metadata Best Practices

4. Error Recovery

5. Content Type Headers

Build docs developers (and LLMs) love

Proposals

Proposal Workflow

Documents

Admin

Authentication

​Upload RFP Document

​File Validation

​Storage Details

​Vectorization

​Upload Concept Document

​File Validation

​Upload Reference Proposal

​Async Vectorization Process

​Vector Storage

​Upload Supporting Document

​Vector Storage

​Save Concept Text

​Save Work Text

​Text Chunking & Vectorization

​Check Vectorization Status

​Get All File Status

​Status Values

​Get Single File Status

​Error Handling

​Common Error Codes

​Validation Examples

​Best Practices

​1. File Size Optimization

​2. Polling for Vectorization

​3. Metadata Best Practices

​4. Error Recovery

​5. Content Type Headers

Build docs developers (and LLMs) love

Upload RFP Document

File Validation

Storage Details

Vectorization

Upload Concept Document

File Validation

Upload Reference Proposal

Async Vectorization Process

Vector Storage

Upload Supporting Document

Vector Storage

Save Concept Text

Save Work Text

Text Chunking & Vectorization

Check Vectorization Status

Get All File Status

Status Values

Get Single File Status

Error Handling

Common Error Codes

Validation Examples

Best Practices

1. File Size Optimization

2. Polling for Vectorization

3. Metadata Best Practices

4. Error Recovery

5. Content Type Headers