Skip to main content

Upload RFP Document

proposal_id
string
required
Unique identifier for the proposal
file
UploadFile
required
PDF file to upload (max 10MB)
success
boolean
Indicates if the upload was successful
message
string
Success or error message
filename
string
Name of the uploaded file
document_key
string
S3 object key where the file is stored
size
number
File size in bytes
curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@rfp_document.pdf"
{
  "success": true,
  "message": "PDF uploaded successfully",
  "filename": "rfp_document.pdf",
  "document_key": "PROP-2024-001/documents/rfp/rfp_document.pdf",
  "size": 2458624
}

File Validation

  • Supported formats: PDF only
  • Maximum size: 10MB
  • Validation checks:
    • File must not be empty
    • Must have valid PDF header (%PDF)
    • File extension must be .pdf

Storage Details

S3 Path Structure:
{proposal_code}/documents/rfp/{filename}
S3 Metadata:
  • proposal-id: Proposal UUID
  • uploaded-by: User ID who uploaded the file
  • original-size: File size in bytes
DynamoDB Update: Updates uploaded_files.rfp-document array with the filename.

Vectorization

RFP documents are not automatically vectorized during upload. Vectorization occurs during the “Analyze & Continue” step to optimize upload performance.

Upload Concept Document

proposal_id
string
required
Unique identifier for the proposal
file
UploadFile
required
PDF, DOC, or DOCX file (max 10MB)
success
boolean
Indicates if the upload was successful
message
string
Success message
filename
string
Name of the uploaded file
document_key
string
S3 object key where the file is stored
size
number
File size in bytes
curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload-concept-file" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]"
{
  "success": true,
  "message": "Concept file uploaded successfully",
  "filename": "concept.docx",
  "document_key": "PROP-2024-001/documents/initial_concept/concept.docx",
  "size": 45120
}

File Validation

  • Supported formats: PDF, DOC, DOCX
  • Maximum size: 10MB
  • No vectorization: Concept documents are stored but not vectorized
S3 Path:
{proposal_code}/documents/initial_concept/{filename}

Upload Reference Proposal

proposal_id
string
required
Unique identifier for the proposal
file
UploadFile
required
PDF or DOCX file (max 5MB)
donor
string
Donor organization name (e.g., “USAID”, “World Bank”)
sector
string
Sector category (e.g., “Health”, “Education”)
year
string
Year of the reference proposal (e.g., “2023”)
status
string
Proposal status (e.g., “Funded”, “Rejected”)
success
boolean
Always true for successful uploads
message
string
Status message indicating vectorization in progress
filename
string
Name of the uploaded file
document_key
string
S3 object key
size
number
File size in bytes
vectorization_status
string
Initial status: "pending"
curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload-reference-file" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]" \
  -F "donor=USAID" \
  -F "sector=Health" \
  -F "year=2023" \
  -F "status=Funded"
{
  "success": true,
  "message": "Reference proposal uploaded, vectorization in progress",
  "filename": "reference.pdf",
  "document_key": "PROP-2024-001/documents/references/reference.pdf",
  "size": 1234567,
  "vectorization_status": "pending"
}

Async Vectorization Process

  1. Upload Phase (Fast):
    • Validates file format and size
    • Uploads to S3 with metadata
    • Updates DynamoDB with vectorization_status: pending
    • Returns immediately
  2. Vectorization Phase (Async):
    • Triggers Lambda worker with InvocationType: Event
    • Lambda extracts text using PyPDF2 or python-docx
    • Chunks text (1000 chars, 200 char overlap)
    • Generates embeddings using Amazon Titan Embed v2
    • Stores vectors in S3 Vectors reference-proposals-index
    • Updates vectorization_status to completed or failed
  3. Client Polling:
    • Poll /api/proposals/{proposal_id}/documents/vectorization-status
    • Check for all_completed: true or individual file status

Vector Storage

Index: reference-proposals-index Vector Key Format (Metadata Encoded):
{proposal_id}|{donor}|{sector}|{year}|{document_name}|{chunk_index}|{total_chunks}
Example:
PROP-2024-001-chunk-0|USAID|Health|2023|reference.pdf|0|15
Embedding Model: amazon.titan-embed-text-v2:0 (1024 dimensions) S3 Metadata:
  • proposal-id
  • uploaded-by
  • original-size
  • donor, sector, year, status

Upload Supporting Document

proposal_id
string
required
Unique identifier for the proposal
file
UploadFile
required
PDF or DOCX file (max 5MB)
organization
string
Organization name
project_type
string
Type of project (e.g., “Infrastructure”, “Capacity Building”)
region
string
Geographic region (e.g., “East Africa”, “Horn of Africa”)
success
boolean
Indicates successful upload
message
string
Status message
filename
string
Uploaded filename
document_key
string
S3 key
size
number
File size in bytes
vectorization_status
string
Initial status: "pending"
metadata
object
Metadata echoed back
curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/upload-supporting-file" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@existing_work.pdf" \
  -F "organization=IGAD" \
  -F "project_type=Capacity Building" \
  -F "region=East Africa"
{
  "success": true,
  "message": "Supporting document uploaded, vectorization in progress",
  "filename": "existing_work.pdf",
  "document_key": "PROP-2024-001/documents/supporting/existing_work.pdf",
  "size": 891234,
  "vectorization_status": "pending",
  "metadata": {
    "organization": "IGAD",
    "project_type": "Capacity Building",
    "region": "East Africa"
  }
}

Vector Storage

Index: existing-work-index Vector Key Format:
{proposal_id}|{organization}|{project_type}|{region}|{document_name}|{chunk_index}|{total_chunks}
S3 Path:
{proposal_code}/documents/supporting/{filename}

Save Concept Text

proposal_id
string
required
Unique identifier for the proposal
concept_text
string
required
Plain text concept (minimum 50 characters)
{
  "success": true,
  "message": "Concept text saved successfully",
  "text_length": 1024
}
Storage:
  • S3: {proposal_code}/documents/initial_concept/concept_text.txt
  • DynamoDB: text_inputs.initial-concept

Save Work Text

proposal_id
string
required
Unique identifier for the proposal
work_text
string
required
Plain text existing work (minimum 50 characters)
organization
string
Organization name
project_type
string
Project type
region
string
Geographic region
success
boolean
Upload success indicator
message
string
Success message
text_length
number
Character count
vectorized
boolean
Whether vectorization succeeded
curl -X POST "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/save-work-text" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "work_text=Our organization has implemented 15 water sanitation projects..." \
  -F "organization=IGAD" \
  -F "project_type=WASH" \
  -F "region=East Africa"
{
  "success": true,
  "message": "Work text saved successfully",
  "text_length": 2048,
  "vectorized": true
}

Text Chunking & Vectorization

  1. Chunking: Text is split into 1000-character chunks with 200-character overlap
  2. Vector Storage: Each chunk is stored separately in existing-work-index
  3. Key Format: {proposal_id}-work-chunk-{idx}|{org}|{type}|{region}|existing_work_text|{idx}|{total}
  4. Synchronous: Unlike file uploads, text vectorization happens immediately

Check Vectorization Status

Get All File Status

proposal_id
string
required
Unique identifier for the proposal
success
boolean
Request success indicator
vectorization_status
object
Map of filenames to status objects
all_completed
boolean
Whether all files are vectorized
has_pending
boolean
Whether any files are still processing
has_failed
boolean
Whether any vectorizations failed
curl -X GET "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/vectorization-status" \
  -H "Authorization: Bearer YOUR_TOKEN"
{
  "success": true,
  "vectorization_status": {
    "reference1.pdf": {
      "status": "completed",
      "started_at": "2024-03-15T10:30:00Z",
      "completed_at": "2024-03-15T10:32:15Z",
      "chunks_processed": 12,
      "total_chunks": 12
    },
    "reference2.pdf": {
      "status": "processing",
      "started_at": "2024-03-15T10:35:00Z",
      "chunks_processed": 5,
      "total_chunks": 18
    },
    "reference3.pdf": {
      "status": "failed",
      "started_at": "2024-03-15T10:40:00Z",
      "error": "Failed to extract text from PDF"
    }
  },
  "all_completed": false,
  "has_pending": true,
  "has_failed": true
}

Status Values

StatusDescription
pendingWaiting to start
processingCurrently vectorizing
completedSuccessfully vectorized
failedVectorization failed

Get Single File Status

proposal_id
string
required
Proposal ID
filename
string
required
Filename to check
curl -X GET "https://api.igadinnovationhub.org/api/proposals/{proposal_id}/documents/vectorization-status/{filename}" \
  -H "Authorization: Bearer YOUR_TOKEN"
{
  "success": true,
  "filename": "reference1.pdf",
  "status": "completed",
  "chunks_processed": 12,
  "total_chunks": 12,
  "started_at": "2024-03-15T10:30:00Z",
  "completed_at": "2024-03-15T10:32:15Z"
}

Error Handling

Common Error Codes

CodeErrorSolution
400Invalid file typeUse PDF/DOCX only
400File too largeMax 10MB (RFP/concept) or 5MB (reference/supporting)
400Empty fileEnsure file has content
400Invalid PDF formatFile must start with %PDF
400Text too shortMinimum 50 characters for text inputs
404Proposal not foundCheck proposal ID and ownership
500Upload verification failedFile size mismatch after upload
500S3 bucket not configuredContact system administrator

Validation Examples

try:
    response = requests.post(url, headers=headers, files=files)
    response.raise_for_status()
    result = response.json()
    print(f"Uploaded: {result['filename']}")
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 400:
        print(f"Validation error: {e.response.json()['detail']}")
    else:
        print(f"Upload failed: {e}")

Best Practices

1. File Size Optimization

  • Compress PDFs before upload
  • Remove unnecessary images from documents
  • Use PDF/A format for better text extraction

2. Polling for Vectorization

Polling Example
const pollVectorizationStatus = async (proposalId: string): Promise<void> => {
  const maxAttempts = 60 // 5 minutes
  let attempts = 0

  while (attempts < maxAttempts) {
    const response = await fetch(
      `${API_URL}/api/proposals/${proposalId}/documents/vectorization-status`
    )
    const { all_completed, has_failed } = await response.json()

    if (all_completed) {
      console.log('All documents vectorized successfully')
      return
    }

    if (has_failed) {
      console.error('Some documents failed to vectorize')
      return
    }

    await new Promise(resolve => setTimeout(resolve, 5000)) // Wait 5 seconds
    attempts++
  }

  console.warn('Vectorization timeout - still processing')
}

3. Metadata Best Practices

  • Reference Proposals:
    • Always include donor for better retrieval
    • Use consistent sector naming
    • Include year for temporal filtering
  • Supporting Documents:
    • Use organization to identify source
    • Standardize project_type values
    • Include region for geographic filtering

4. Error Recovery

  • Failed vectorizations can be retried by re-uploading
  • Check vectorization status before proceeding to analysis
  • Use has_failed flag to detect issues early

5. Content Type Headers

  • Always use multipart/form-data for file uploads
  • Don’t manually set Content-Type when using FormData in browsers
  • Python/cURL: Let the library set boundaries automatically

Build docs developers (and LLMs) love