The CGIAR Risk Intelligence Tool is built on a serverless AWS architecture with a multi-agent AI pipeline for automated risk assessment.
High-Level Architecture
Component Details
Frontend - Next.js 15 Static Export
Framework : Next . js 15 ( App Router )
Runtime : React 19
Language : TypeScript 5.7
Styling : Tailwind CSS v4
Components : shadcn / ui
State : React Query ( TanStack Query )
Forms : React Hook Form + Zod
Build : Static export ( output : 'export' )
Hosting : Deployed to S3 as static HTML/CSS/JS, served via CloudFront CDN with SPA fallback routing.
The frontend uses output: 'export' which requires all routes to be known at build time. Dynamic [id] segments are not supported - all entity IDs are passed via query parameters (e.g., /assessments/upload?id=uuid).
Backend - NestJS 10 REST API
Framework : NestJS 10
Runtime : Node . js 20 ( ARM64 )
Language : TypeScript 5.7
ORM : Prisma 6
Database : PostgreSQL 15
Validation : class - validator
Module : CommonJS
Entry Points :
main.ts - Local development server (port 3001)
lambda.ts - AWS Lambda handler for API Gateway
worker.ts - Background job processor
API Lambda Environment Variables
DATABASE_URL = postgresql://user:pass@host:5432/alliance_risk
COGNITO_USER_POOL_ID = us-east-1_ABC123
COGNITO_CLIENT_ID = abc123def456
COGNITO_REGION = us-east-1
WORKER_LAMBDA_NAME = alliance-risk-worker
FILE_BUCKET_NAME = alliance-risk-files-dev
ENVIRONMENT = development
AWS_REGION = us-east-1
Database - PostgreSQL 15 via Prisma
Schema Overview (see packages/api/prisma/schema.prisma):
Core Models
Risk Data
Job System
Prompts (Admin)
model User {
id String @id @default ( uuid ())
cognitoId String @unique // Sync from Cognito
email String @unique
}
model Assessment {
id String @id @default ( uuid ())
name String
companyName String
status AssessmentStatus // DRAFT | ANALYZING | ACTION_REQUIRED | COMPLETE
intakeMode IntakeMode // UPLOAD | GUIDED_INTERVIEW | MANUAL_ENTRY
overallRiskScore Float ?
overallRiskLevel RiskLevel ? // LOW | MODERATE | HIGH | CRITICAL
userId String
}
model AssessmentDocument {
id String @id @default ( uuid ())
assessmentId String
fileName String
s3Key String
status DocumentStatus // PENDING_UPLOAD | UPLOADED | PARSING | PARSED | FAILED
parseJobId String ? @unique
}
model GapField {
id String @id @default ( uuid ())
assessmentId String
category RiskCategory // FINANCIAL | CLIMATE_ENVIRONMENTAL | etc.
field String
label String
extractedValue String ? // From AI parser
correctedValue String ? // Manual correction
status GapFieldStatus // MISSING | PARTIAL | VERIFIED
isMandatory Boolean
}
model RiskScore {
id String @id @default ( uuid ())
assessmentId String
category RiskCategory
score Float // 0-100
level RiskLevel // LOW | MODERATE | HIGH | CRITICAL
subcategories Json // Array of 5 subcategory scores
evidence String ?
narrative String ?
}
model Recommendation {
id String @id @default ( uuid ())
riskScoreId String
text String // AI-generated
priority RecommendationPriority // HIGH | MEDIUM | LOW
isEdited Boolean @default ( false )
editedText String ? // Analyst override
}
model Job {
id String @id @default ( uuid ())
type JobType // AI_PREVIEW | PARSE_DOCUMENT | GAP_DETECTION | RISK_ANALYSIS | REPORT_GENERATION
status JobStatus // PENDING | PROCESSING | COMPLETED | FAILED
input Json // Serialized request payload
result Json ? // Output from worker
error String ?
attempts Int @default ( 0 )
maxAttempts Int @default ( 3 )
createdById String
}
model Prompt {
id String @id @default ( uuid ())
name String
section AgentSection // parser | gap_detector | risk_analysis | report_generation
version Int @default ( 1 )
isActive Boolean @default ( true )
systemPrompt String @db.Text
userPromptTemplate String @db.Text
tone String ?
outputFormat String ?
fewShot Json ? // Few-shot examples
}
model PromptVersion {
id String @id @default ( uuid ())
promptId String
version Int
// Snapshot of all prompt fields at this version
}
model PromptComment {
id String @id @default ( uuid ())
promptId String
parentId String ? // For threaded replies
content String
authorId String
}
VPC Configuration : The RDS instance resides in a private VPC with no public internet access. Only Lambda functions deployed in the same VPC can connect.
Migrations cannot run from local machines against the deployed RDS. Use the remote migration script: pnpm migrate:remote # Sends SQL via Worker Lambda's run-sql action
Authentication - AWS Cognito
User Pool Configuration :
Email-based sign-in (username = email)
Password policy: 8+ chars, uppercase, lowercase, number, special char
MFA: Optional (disabled by default)
Admin group: admin (checked by AdminGuard in NestJS)
JWT Token Flow :
Token Lifecycle :
Access Token : 60 minutes (short-lived for API requests)
Refresh Token : 30 days (if Remember Me checked) or session-only
Auto-Refresh : API client intercepts 401 responses, uses refresh token, retries request
Token Storage Implementation
// packages/web/src/lib/token-manager.ts
export const TokenManager = {
getAccessToken () : string | null {
return sessionStorage . getItem ( 'accessToken' ) ?? localStorage . getItem ( 'accessToken' );
},
setTokens ( tokens : Tokens , rememberMe : boolean ) {
const storage = rememberMe ? localStorage : sessionStorage ;
storage . setItem ( 'accessToken' , tokens . accessToken );
storage . setItem ( 'idToken' , tokens . idToken );
if ( tokens . refreshToken ) {
storage . setItem ( 'refreshToken' , tokens . refreshToken );
}
},
clearTokens () {
sessionStorage . clear ();
localStorage . clear ();
}
};
AI Pipeline - AWS Bedrock Multi-Agent System
Model Configuration (see packages/shared/src/constants/bedrock.config.ts):
export const BEDROCK_MODELS = {
parser: {
modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
},
gap_detector: {
modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
},
risk_analysis: {
modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
},
report_generation: {
modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
}
};
Agent Pipeline :
Parser Agent
Input : S3 URI to uploaded PDF/DOCXProcess :
Fetch document from S3
Extract text using AWS Textract (for PDFs) or raw text (DOCX)
Send text to Bedrock with parser prompt
Extract structured data across 35 risk indicators
Output : JSON object with extracted fields mapped to database schema{
"financial" : {
"revenue_projection_year_1" : "5000000 KES" ,
"cost_structure" : "70% COGS, 20% OpEx, 10% debt service" ,
"credit_access" : "Line of credit with KCB Bank, 2M limit" ,
// ... 2 more fields
},
// ... 6 more categories
}
Gap Detector Agent
Input : Parsed data from Step 1 + expected field list (35 indicators)Process :
Compare extracted data against required fields
Validate data quality (completeness, format, confidence)
Mark each field as VERIFIED, PARTIAL, or MISSING
Output : Array of GapField records inserted into database[
{
category: 'FINANCIAL' ,
field: 'revenue_projection_year_1' ,
label: 'Revenue Projection - Year 1' ,
extractedValue: '5000000 KES' ,
status: 'VERIFIED' ,
isMandatory: true
},
{
category: 'FINANCIAL' ,
field: 'liquidity_ratio' ,
label: 'Current Liquidity Ratio' ,
extractedValue: null ,
status: 'MISSING' ,
isMandatory: true
}
]
Risk Analysis Agent
Input : All gap fields (extracted + corrected values)Process :
For each of 7 categories:
Score 5 subcategories (0-100 scale)
Map to risk level: 0-33 Low, 34-66 Medium, 67-100 High
Generate evidence text citing data sources
Calculate weighted average for category score
Generate narrative explaining risk drivers
Create 3-5 prioritized recommendations per category
Output : 7 RiskScore records + associated Recommendation records{
category : 'FINANCIAL' ,
score : 68.5 ,
level : 'HIGH' ,
subcategories : [
{ name: 'Revenue Stability' , score: 72 , level: 'HIGH' , weight: 0.25 },
{ name: 'Cost Management' , score: 65 , level: 'MEDIUM' , weight: 0.20 },
{ name: 'Credit Access' , score: 60 , level: 'MEDIUM' , weight: 0.20 },
{ name: 'Liquidity' , score: 78 , level: 'HIGH' , weight: 0.20 },
{ name: 'Capital Structure' , score: 68 , level: 'HIGH' , weight: 0.15 }
],
evidence : "Revenue projections show 40% dependency on single buyer..." ,
narrative : "Financial risk is elevated due to revenue concentration and..." ,
recommendations : [
{
text: "Diversify customer base to reduce concentration risk below 25% per customer" ,
priority: 'HIGH'
},
// ... more recommendations
]
}
Report Generator Agent
Input : Assessment ID (fetches all data from database)Process :
Query all gap fields, risk scores, recommendations, comments
Generate HTML report template with:
Executive summary
Company profile
Risk scorecard (7 categories + 35 subcategories)
Evidence and narratives
Prioritized recommendations
Appendices (methodology, data sources)
Convert HTML to PDF using headless Chrome (Puppeteer in Lambda)
Upload PDF to S3
Generate pre-signed download URL (1-hour expiry)
Output : S3 URI + pre-signed URL{
"s3Uri" : "s3://alliance-risk-files/reports/assessment-abc123.pdf" ,
"downloadUrl" : "https://alliance-risk-files.s3.amazonaws.com/reports/assessment-abc123.pdf?X-Amz-Signature=..." ,
"expiresAt" : "2026-03-04T15:30:00Z"
}
Agent Chaining : The job system automatically chains PARSE_DOCUMENT → GAP_DETECTION. Analysts manually trigger RISK_ANALYSIS and REPORT_GENERATION after reviewing gaps.
Asynchronous Job System
Fire-and-Forget Pattern :
Job Handler Interface (packages/api/src/jobs/job-handler.interface.ts):
export interface JobHandler < TInput = unknown , TOutput = unknown > {
execute ( input : TInput ) : Promise < TOutput >;
onFailure ? ( id : string , error : Error ) : Promise < void >; // Optional cleanup
}
Retry Logic :
Max attempts: 3 (configurable per job)
Backoff: Exponential (1s, 2s, 4s)
Permanent failure: Status set to FAILED, error message stored
Document status updated: PARSING → FAILED
Infrastructure - AWS CloudFormation
Stack Resources (infra/lib/alliance-risk-stack.ts):
Compute
Storage
Networking
Auth & CDN
API Lambda :
Runtime : nodejs20.x
Architecture : arm64
Memory : 1024 MB
Timeout : 30 seconds
Handler : lambda.handler
VPC : Enabled (access to RDS)
Policies : RDS, S3, Cognito, Lambda:InvokeFunction
Worker Lambda :
Runtime : nodejs20.x
Architecture : arm64
Memory : 1024 MB
Timeout : 15 minutes
Handler : worker.handler
VPC : Enabled (access to RDS)
Policies : RDS, S3, Bedrock, Textract
API Gateway HTTP API :
Timeout : 30 seconds
CORS : Enabled
Default Route : ANY /{proxy+} → API Lambda
RDS PostgreSQL :
Engine : postgres 15.8
Instance : db.t3.micro (2 vCPU, 1 GB RAM)
Storage : 20 GB gp3 (encrypted)
VPC : Private subnets (no public access)
Backup : 7-day retention
Credentials : Secrets Manager
S3 File Bucket :
Name : alliance-risk-files-{env}
Versioning : Enabled
Encryption : AES256
Lifecycle : Delete after 90 days
S3 Web Bucket :
Name : alliance-risk-web-{env}
Static Hosting : Enabled
Public Access : Blocked (CloudFront only)
VPC :
CIDR : 10.0.0.0/16
Public Subnets : 2 AZs (10.0.1.0/24, 10.0.2.0/24)
Private Subnets : 2 AZs (10.0.11.0/24, 10.0.12.0/24)
VPC Endpoints (Interface) :
- Cognito IDP (com.amazonaws.us-east-1.cognito-idp)
Security Groups :
- Lambda SG : Outbound all, Inbound none
- RDS SG : Inbound 5432 from Lambda SG
Note : Bedrock and S3 accessible without VPC endpoints
(Lambdas route through NAT Gateway for public AWS services)
Cognito User Pool :
Attributes : email (required, verified)
Password Policy : 8+ chars, mixed case, number, special
MFA : Optional (disabled by default)
Groups : admin
CloudFront Distribution :
Origin : S3 Web Bucket (OAI)
Price Class : PriceClass_100 (NA + EU)
Default Root : index.html
Error Pages : 403, 404 → /index.html (SPA fallback)
Caching :
- * .html , * .json: max-age=0, must-revalidate
- * .js , * .css , /assets/* : max-age=31536000, immutable
Deployment Scripts :
API Deployment
Web Deployment
Database Migrations
# scripts/deploy-api.sh
# 1. Bundle API Lambda with esbuild
esbuild packages/api/src/lambda.ts \
--bundle \
--platform=node \
--target=node20 \
--external:@prisma/client \
--external:pg \
--outfile=dist/lambda.js
# 2. Bundle Worker Lambda
esbuild packages/api/src/worker.ts \
--bundle \
--platform=node \
--target=node20 \
--external:@prisma/client \
--external:pg \
--outfile=dist/worker.js
# 3. Copy external dependencies (Prisma, pg)
cp -r packages/api/node_modules/@prisma dist/node_modules/
cp -r packages/api/node_modules/pg dist/node_modules/
# 4. Zip and upload to S3
cd dist && zip -r lambda.zip . && cd ..
aws s3 cp dist/lambda.zip s3://alliance-risk-deploy/api- $( git rev-parse --short HEAD ) .zip
# 5. Update Lambda functions
aws lambda update-function-code \
--function-name alliance-risk-api \
--s3-bucket alliance-risk-deploy \
--s3-key api- $( git rev-parse --short HEAD ) .zip
aws lambda update-function-code \
--function-name alliance-risk-worker \
--s3-bucket alliance-risk-deploy \
--s3-key api- $( git rev-parse --short HEAD ) .zip
Data Flow Examples
Example 1: Document Upload and Parsing
Example 2: Risk Analysis and Scoring
// Frontend triggers risk analysis
const { mutate : triggerAnalysis } = useMutation ({
mutationFn : async ( assessmentId : string ) => {
const response = await apiClient . post ( `/api/assessments/ ${ assessmentId } /analyze` );
return response . data ;
},
onSuccess : ( data ) => {
// Start polling job
pollJob ( data . jobId );
}
});
// Backend creates job and invokes worker
@ Post ( ':id/analyze' )
async triggerAnalysis (
@ Param ( 'id' ) id : string ,
@ CurrentUser () user : UserClaims
) {
const jobId = await this . jobsService . create (
JobType . RISK_ANALYSIS ,
{ assessmentId: id },
user . userId
);
return { jobId };
}
// Worker executes risk analysis handler
export class RiskAnalysisHandler implements JobHandler {
async execute ( input : { assessmentId : string }) {
// 1. Fetch all gap fields from DB
const gaps = await this . prisma . gapField . findMany ({
where: { assessmentId: input . assessmentId }
});
// 2. For each category, build prompt with data
for ( const category of RISK_CATEGORIES ) {
const categoryGaps = gaps . filter ( g => g . category === category . key );
const prompt = `
Analyze ${ category . label } for this business:
${ categoryGaps . map ( g => ` ${ g . label } : ${ g . correctedValue ?? g . extractedValue ?? 'MISSING' } ` ). join ( ' \n ' ) }
Score each of 5 subcategories on 0-100 scale.
Output JSON: {subcategories: [{name, score, level, rationale}], evidence, narrative}
` ;
// 3. Call Bedrock
const response = await this . bedrockService . converse ({
modelId: BEDROCK_MODELS . risk_analysis . modelId ,
systemPrompt: riskAnalysisPrompt . systemPrompt ,
userMessage: prompt
});
// 4. Parse response and calculate weighted score
const parsed = JSON . parse ( response . content );
const categoryScore = parsed . subcategories . reduce (
( sum , sub ) => sum + ( sub . score * sub . weight ),
0
);
// 5. Insert risk score
await this . prisma . riskScore . create ({
data: {
assessmentId: input . assessmentId ,
category: category . key ,
score: categoryScore ,
level: this . mapScoreToLevel ( categoryScore ),
subcategories: parsed . subcategories ,
evidence: parsed . evidence ,
narrative: parsed . narrative
}
});
// 6. Insert recommendations
for ( const rec of parsed . recommendations ) {
await this . prisma . recommendation . create ({
data: {
riskScoreId: riskScore . id ,
text: rec . text ,
priority: rec . priority
}
});
}
}
// 7. Calculate overall risk score (weighted average of 7 categories)
const overallScore = await this . calculateOverallScore ( input . assessmentId );
// 8. Update assessment
await this . prisma . assessment . update ({
where: { id: input . assessmentId },
data: {
overallRiskScore: overallScore ,
overallRiskLevel: this . mapScoreToLevel ( overallScore ),
status: 'COMPLETE'
}
});
return { overallScore , categories: RISK_CATEGORIES . length };
}
private mapScoreToLevel ( score : number ) : RiskLevel {
if ( score < 33 ) return 'LOW' ;
if ( score < 67 ) return 'MODERATE' ;
return 'HIGH' ;
}
}
Performance Characteristics
Document Parsing Typical : 30-60 seconds for 10-30 page PDFsVariables :
Document length (pages)
Text density
Textract processing time
Bedrock throttling
Timeout : 15 minutes (Worker Lambda)
Gap Detection Typical : 10-20 secondsVariables :
Number of extracted fields
Validation complexity
Timeout : 15 minutes (Worker Lambda)
Risk Analysis Typical : 60-90 seconds (7 categories × 10-15s each)Variables :
Bedrock API latency
Prompt complexity
Number of recommendations
Timeout : 15 minutes (Worker Lambda)
Report Generation Typical : 20-30 secondsVariables :
Report length
PDF rendering complexity
S3 upload speed
Timeout : 15 minutes (Worker Lambda)
Cold Start Impact :
API Lambda: 1-2 seconds (arm64, 1024MB, bundled with esbuild)
Worker Lambda: 2-3 seconds (arm64, 1024MB, Prisma client generation)
Mitigation: Provisioned concurrency for production (not enabled in MVP)
Security Model
Authentication
AWS Cognito User Pool with email + password
JWT tokens (access: 60min, refresh: 30 days)
Token rotation on refresh
Rate limiting: 5 req/min on auth endpoints (NestJS Throttler)
Authorization
Global JwtAuthGuard on all API routes (except @Public())
JWT signature verification against Cognito JWKS
User roles via Cognito groups (admin group)
AdminGuard checks cognito:groups claim
Resource ownership validation (userId must match createdById)
Data Encryption
In Transit : HTTPS only (CloudFront, API Gateway, S3 pre-signed URLs)
At Rest : RDS encrypted with AWS KMS, S3 SSE-AES256
Secrets : Database credentials in Secrets Manager, auto-rotation enabled
Network Isolation
RDS in private VPC subnets (no internet gateway)
Lambda functions in VPC to access RDS
VPC endpoint for Cognito (interface endpoint)
S3 and Bedrock accessed via NAT Gateway (or VPC endpoints in production)
Input Validation
NestJS ValidationPipe with class-validator on all DTOs
File upload limits: 10MB max, PDF/DOCX only (MIME type validation)
SQL injection prevention: Prisma ORM (parameterized queries)
XSS prevention: React auto-escaping, CSP headers on CloudFront
Production Hardening Checklist (not implemented in MVP):
Enable CloudTrail for API audit logs
Add WAF rules to CloudFront (rate limiting, geo-blocking)
Implement RBAC beyond admin/analyst (e.g., viewer, editor roles)
Add MFA enforcement for admin accounts
Enable VPC Flow Logs
Set up CloudWatch alarms for error rates, latency, failed auth
Monitoring and Observability
CloudWatch Logs :
/aws/lambda/alliance-risk-api - API request/response, errors
/aws/lambda/alliance-risk-worker - Job processing, Bedrock calls, failures
/aws/rds/cluster/alliance-risk/postgresql - Slow queries, errors
Key Metrics :
Lambda invocations, duration, errors, throttles
RDS connections, CPU, memory, storage
Bedrock API latency, throttling, errors
S3 request counts, error rates
Cost Breakdown (estimated monthly for dev environment):
RDS db.t3.micro: $15
Lambda (API + Worker): $5-10 (low traffic)
Bedrock (Claude 3.5 Sonnet): $20-50 (100-200 assessments/month)
S3: $2
CloudFront: $5
Total : ~$50-80/month
Use AWS Cost Explorer to track Bedrock usage. Each assessment consumes approximately:
Parsing: 50K input tokens + 5K output tokens
Gap Detection: 10K input + 2K output
Risk Analysis: 30K input + 10K output (×7 categories)
Report Generation: 20K input + 15K output
Total per assessment : ~200K tokens = ~$0.30 at Claude 3.5 Sonnet v2 pricing
Development Workflow
Local Setup
Testing
Deployment
Debugging
# 1. Clone repository
git clone < repo-ur l >
cd alliance-risk-analysis-tool
# 2. Install dependencies
pnpm install
# 3. Set up local PostgreSQL
createdb alliance_risk
# 4. Configure API environment
cp packages/api/.env.example packages/api/.env
# Edit .env with local DATABASE_URL
# 5. Run migrations
pnpm --filter @alliance-risk/api exec prisma migrate deploy
# 6. Seed database
npx --prefix packages/api tsx prisma/seed.ts
# 7. Start dev servers (API :3001 + Web :3000)
pnpm dev
# Run all tests
pnpm test
# Test single package
pnpm --filter @alliance-risk/api test
pnpm --filter @alliance-risk/web test
# Test specific file
pnpm --filter @alliance-risk/api test -- --testPathPattern=auth.controller
# Coverage
pnpm --filter @alliance-risk/api test -- --coverage
# Full deployment (infrastructure + code)
pnpm --filter @alliance-risk/infra cfn:deploy dev
pnpm migrate:remote
pnpm deploy:all # API + Web
# Code-only deployment
pnpm deploy:api # Build + upload Lambda bundles
pnpm deploy:web # Build static export + sync S3 + invalidate CloudFront
# Database changes
pnpm --filter @alliance-risk/api exec prisma migrate dev --name add_column
pnpm migrate:remote # Apply to deployed RDS
# Local API with breakpoints
pnpm --filter @alliance-risk/api dev
# Attach debugger on port 9229
# Lambda logs (deployed)
aws logs tail /aws/lambda/alliance-risk-api --follow
aws logs tail /aws/lambda/alliance-risk-worker --follow
# Database queries (local)
pnpm --filter @alliance-risk/api exec prisma studio
# Database queries (deployed RDS via Worker Lambda)
aws lambda invoke \
--function-name alliance-risk-worker \
--payload '{"action":"run-sql","sql":"SELECT * FROM jobs WHERE status='FAILED' LIMIT 10"}' \
/tmp/result.json && cat /tmp/result.json
Scalability Considerations
Current Limits (MVP configuration):
RDS: 100 concurrent connections (db.t3.micro)
Lambda: 10 concurrent executions (soft limit, can increase)
Bedrock: 5 requests/second (default throttle per model)
S3: 5,500 GET/3,500 PUT per second per prefix (effectively unlimited)
Scaling Strategies (for production):
RDS : Upgrade to db.r6g.xlarge for 100+ concurrent users
Lambda : Increase reserved concurrency for API and Worker
Bedrock : Request quota increase (up to 1000 req/s per model)
CloudFront : No action needed, auto-scales globally
Database : Add indexes on assessmentId, userId, status columns
Caching : Add Redis/ElastiCache for session storage, prompt caching
CDN : Cache API responses for read-heavy endpoints (GET /api/prompts/section/:section)
Bedrock : Batch subcategory scoring (1 call instead of 5 per category)
Lambda : Use ARM64 (20% cheaper, already implemented)
RDS : Enable auto-pause for dev/staging (Aurora Serverless v2)
S3 : Lifecycle policy to Glacier after 90 days
Bedrock : Switch to Claude 3 Haiku for parsing (5x cheaper, acceptable accuracy)