Document management

The Document Management system handles all file operations for the proposal workflow, including upload, storage in S3, vectorization for AI retrieval, and deletion with cleanup.

Supported document types

The platform supports different document types for various purposes:

Document Type	Formats	Max Size	Vectorized	Purpose
RFP Document	PDF	10 MB	No	The Request for Proposal to respond to
Concept Document	PDF, DOC, DOCX	10 MB	No	Your initial project concept
Concept Text	Plain text	-	Yes	Text-based concept input
Reference Proposals	PDF, DOCX	5 MB	Yes	Previously successful proposals
Supporting Documents	PDF, DOCX	5 MB	Yes	Additional context materials
Work Text	Plain text	-	Yes	Existing work descriptions

Upload workflow

Upload RFP document

curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/upload \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]"

The RFP document is:

Validated for PDF format (%PDF header check)
Stored in S3 at {proposal_code}/documents/rfp-document/{filename}
Metadata updated in DynamoDB
Text extracted for analysis (not vectorized)

Upload concept document

curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/upload-concept-file \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]"

Concept documents support PDF, DOC, and DOCX formats.

Save concept text

For text-based concepts:

curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/save-concept-text \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "Climate-smart agriculture project focusing on..."}'

Concept text is immediately vectorized for AI retrieval. Minimum length: 50 characters.

Upload reference proposals

curl -X POST https://api.igad-hub.com/api/proposals/{proposal_id}/documents/upload-reference-file \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]"

Reference proposals are:

Stored in S3
Asynchronously vectorized by Lambda worker
Indexed in reference-proposals-index for AI retrieval

Vectorization process

How vectorization works

Documents marked for vectorization undergo this process:

Lambda worker triggered

When a file is uploaded, a Lambda worker is invoked asynchronously to handle vectorization.

Text extraction

Text is extracted from PDF/DOCX using PyPDF2 or python-docx libraries.

Chunking

Text is split into chunks (1000 characters with 200-character overlap) to maintain context.

Embedding generation

Each chunk is embedded using Amazon Titan Embed v2 (1024 dimensions).

Vector storage

Vectors are stored in S3 Express One Zone with metadata:

proposal_id
document_type
chunk_index
source_filename

Check vectorization status

curl -X GET https://api.igad-hub.com/api/proposals/{proposal_id}/documents/vectorization-status \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "reference_proposals": [
    {
      "filename": "reference-proposal-2023.pdf",
      "status": "completed",
      "vector_count": 45,
      "completed_at": "2024-01-15T10:30:00Z"
    },
    {
      "filename": "reference-proposal-2024.pdf",
      "status": "processing",
      "started_at": "2024-01-15T10:35:00Z"
    }
  ],
  "supporting_documents": [...]
}

Vector indexes

The platform maintains separate vector indexes:

reference-proposals-index: Vectors from reference proposals
existing-work-index: Vectors from supporting documents and work text
concept-index: Vectors from concept text

These indexes are queried during AI generation to retrieve relevant context.

Storage architecture

S3 bucket structure

igad-{environment}-documents/
└── {proposal_code}/
    └── documents/
        ├── rfp-document/
        │   └── {filename}.pdf
        ├── concept-document/
        │   └── {filename}.docx
        ├── concept-text/
        │   └── concept-text.txt
        ├── reference-proposals/
        │   ├── reference-1.pdf
        │   └── reference-2.pdf
        └── supporting-files/
            ├── supporting-1.pdf
            └── supporting-2.docx

S3 vectors bucket

igad-{environment}-vectors/
└── {proposal_code}/
    ├── reference-proposals-index/
    │   └── {filename}/
    │       ├── chunk-0
    │       ├── chunk-1
    │       └── ...
    └── existing-work-index/
        └── {filename}/
            ├── chunk-0
            └── ...

DynamoDB storage

Proposal documents are tracked in DynamoDB:

{
  "PK": "PROPOSAL#PROP-20240115-A1B2",
  "SK": "METADATA",
  "uploaded_files": {
    "rfp-document": ["rfp-2024.pdf"],
    "concept-document": ["concept.docx"],
    "reference-proposals": [
      "reference-proposal-2023.pdf",
      "reference-proposal-2024.pdf"
    ],
    "supporting-files": ["existing-work.pdf"]
  },
  "text_inputs": {
    "initial-concept": "Climate-smart agriculture project...",
    "existing-work": "Our organization has implemented..."
  }
}

Document deletion

Delete uploaded files

curl -X DELETE https://api.igad-hub.com/api/proposals/{proposal_id}/documents/rfp-document/{filename} \
  -H "Authorization: Bearer YOUR_TOKEN"

Deletion process:

S3 file deletion: File removed from documents bucket
Vector cleanup: All vectors for this file removed from vectors bucket
DynamoDB update: File removed from uploaded_files list
Analysis cleanup: Related analysis data cleared (for RFP documents)

Deleting an RFP document also clears rfp_analysis data and resets workflow progress.

Delete concept text

curl -X DELETE https://api.igad-hub.com/api/proposals/{proposal_id}/documents/concept-text \
  -H "Authorization: Bearer YOUR_TOKEN"

Clears concept text from both DynamoDB and vector indexes.

Delete entire proposal

Deleting a proposal performs comprehensive cleanup:

curl -X DELETE https://api.igad-hub.com/api/proposals/{proposal_id} \
  -H "Authorization: Bearer YOUR_TOKEN"

Cleanup steps:

Delete vectors

All vectors in reference-proposals-index and existing-work-index for this proposal are removed.

Delete S3 files

All files under {proposal_code}/ are deleted from the documents bucket.

Delete DynamoDB metadata

The proposal metadata record is removed from the table.

Response:

{
  "message": "Proposal deleted successfully",
  "proposal_code": "PROP-20240115-A1B2",
  "cleanup_summary": {
    "vectors_deleted": "attempted",
    "s3_files_deleted": "attempted",
    "dynamodb_deleted": "completed"
  }
}

Vector and S3 deletion are non-critical. If they fail, DynamoDB deletion still proceeds.

List documents

List all documents

curl -X GET https://api.igad-hub.com/api/proposals/{proposal_id}/documents \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "documents": [
    {
      "type": "rfp-document",
      "filename": "rfp-2024.pdf",
      "uploaded_at": "2024-01-15T09:00:00Z",
      "size_bytes": 2457600
    },
    {
      "type": "reference-proposals",
      "filename": "reference-proposal.pdf",
      "uploaded_at": "2024-01-15T09:30:00Z",
      "size_bytes": 1024000,
      "vectorization_status": "completed"
    }
  ]
}

File validation

PDF validation

PDF files are validated by checking the header:

# From source code
if not file_content.startswith(b"%PDF"):
    raise HTTPException(
        status_code=400,
        detail="Invalid PDF file. File must start with %PDF header."
    )

Size limits

RFP/Concept documents: 10 MB
Reference/Supporting documents: 5 MB
Text inputs: No hard limit, but minimum 50 characters for concept text

Supported MIME types

application/pdf
application/msword (DOC)
application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)

Next steps

Upload API

Complete API reference for document uploads

Proposal workflow

Learn how documents fit into the proposal workflow

Architecture

Understand S3, Lambda, and vector storage architecture

Delete API

Document deletion and cleanup API

Get Started

Core Features

Development

Deployment

Supported document types

Upload workflow

Upload RFP document

Upload concept document

Save concept text

Upload reference proposals

Vectorization process

How vectorization works

Check vectorization status

Vector indexes

Storage architecture

S3 bucket structure

S3 vectors bucket

DynamoDB storage

Document deletion

Delete uploaded files

Delete concept text

Delete entire proposal

List documents

List all documents

File validation

PDF validation

Size limits

Supported MIME types

Next steps

Upload API

Proposal workflow

Architecture

Delete API

Build docs developers (and LLMs) love

Get Started

Core Features

Development

Deployment

​Supported document types

​Upload workflow

​Upload RFP document

​Upload concept document

​Save concept text

​Upload reference proposals

​Vectorization process

​How vectorization works

​Check vectorization status

​Vector indexes

​Storage architecture

​S3 bucket structure

​S3 vectors bucket

​DynamoDB storage

​Document deletion

​Delete uploaded files

​Delete concept text

​Delete entire proposal

​List documents

​List all documents

​File validation

​PDF validation

​Size limits

​Supported MIME types

​Next steps

Upload API

Proposal workflow

Architecture

Delete API

Build docs developers (and LLMs) love

Supported document types

Upload workflow

Upload RFP document

Upload concept document

Save concept text

Upload reference proposals

Vectorization process

How vectorization works

Check vectorization status

Vector indexes

Storage architecture

S3 bucket structure

S3 vectors bucket

DynamoDB storage

Document deletion

Delete uploaded files

Delete concept text

Delete entire proposal

List documents

List all documents

File validation

PDF validation

Size limits

Supported MIME types

Next steps