Skip to main content
The CGIAR Risk Intelligence Tool is built on a serverless AWS architecture with a multi-agent AI pipeline for automated risk assessment.

High-Level Architecture

Component Details

Frontend - Next.js 15 Static Export

Framework: Next.js 15 (App Router)
Runtime: React 19
Language: TypeScript 5.7
Styling: Tailwind CSS v4
Components: shadcn/ui
State: React Query (TanStack Query)
Forms: React Hook Form + Zod
Build: Static export (output: 'export')
Hosting: Deployed to S3 as static HTML/CSS/JS, served via CloudFront CDN with SPA fallback routing.
The frontend uses output: 'export' which requires all routes to be known at build time. Dynamic [id] segments are not supported - all entity IDs are passed via query parameters (e.g., /assessments/upload?id=uuid).

Backend - NestJS 10 REST API

Framework: NestJS 10
Runtime: Node.js 20 (ARM64)
Language: TypeScript 5.7
ORM: Prisma 6
Database: PostgreSQL 15
Validation: class-validator
Module: CommonJS
Entry Points:
  • main.ts - Local development server (port 3001)
  • lambda.ts - AWS Lambda handler for API Gateway
  • worker.ts - Background job processor
DATABASE_URL=postgresql://user:pass@host:5432/alliance_risk
COGNITO_USER_POOL_ID=us-east-1_ABC123
COGNITO_CLIENT_ID=abc123def456
COGNITO_REGION=us-east-1
WORKER_LAMBDA_NAME=alliance-risk-worker
FILE_BUCKET_NAME=alliance-risk-files-dev
ENVIRONMENT=development
AWS_REGION=us-east-1

Database - PostgreSQL 15 via Prisma

Schema Overview (see packages/api/prisma/schema.prisma):
model User {
  id        String @id @default(uuid())
  cognitoId String @unique  // Sync from Cognito
  email     String @unique
}

model Assessment {
  id               String @id @default(uuid())
  name             String
  companyName      String
  status           AssessmentStatus  // DRAFT | ANALYZING | ACTION_REQUIRED | COMPLETE
  intakeMode       IntakeMode        // UPLOAD | GUIDED_INTERVIEW | MANUAL_ENTRY
  overallRiskScore Float?
  overallRiskLevel RiskLevel?        // LOW | MODERATE | HIGH | CRITICAL
  userId           String
}

model AssessmentDocument {
  id           String @id @default(uuid())
  assessmentId String
  fileName     String
  s3Key        String
  status       DocumentStatus  // PENDING_UPLOAD | UPLOADED | PARSING | PARSED | FAILED
  parseJobId   String? @unique
}
VPC Configuration: The RDS instance resides in a private VPC with no public internet access. Only Lambda functions deployed in the same VPC can connect.
Migrations cannot run from local machines against the deployed RDS. Use the remote migration script:
pnpm migrate:remote  # Sends SQL via Worker Lambda's run-sql action

Authentication - AWS Cognito

User Pool Configuration:
  • Email-based sign-in (username = email)
  • Password policy: 8+ chars, uppercase, lowercase, number, special char
  • MFA: Optional (disabled by default)
  • Admin group: admin (checked by AdminGuard in NestJS)
JWT Token Flow: Token Lifecycle:
  • Access Token: 60 minutes (short-lived for API requests)
  • Refresh Token: 30 days (if Remember Me checked) or session-only
  • Auto-Refresh: API client intercepts 401 responses, uses refresh token, retries request
// packages/web/src/lib/token-manager.ts
export const TokenManager = {
  getAccessToken(): string | null {
    return sessionStorage.getItem('accessToken') ?? localStorage.getItem('accessToken');
  },
  
  setTokens(tokens: Tokens, rememberMe: boolean) {
    const storage = rememberMe ? localStorage : sessionStorage;
    storage.setItem('accessToken', tokens.accessToken);
    storage.setItem('idToken', tokens.idToken);
    if (tokens.refreshToken) {
      storage.setItem('refreshToken', tokens.refreshToken);
    }
  },
  
  clearTokens() {
    sessionStorage.clear();
    localStorage.clear();
  }
};

AI Pipeline - AWS Bedrock Multi-Agent System

Model Configuration (see packages/shared/src/constants/bedrock.config.ts):
export const BEDROCK_MODELS = {
  parser: {
    modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
  },
  gap_detector: {
    modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
  },
  risk_analysis: {
    modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
  },
  report_generation: {
    modelId: 'anthropic.claude-3-5-sonnet-20241022-v2:0'
  }
};
Agent Pipeline:
1

Parser Agent

Input: S3 URI to uploaded PDF/DOCXProcess:
  1. Fetch document from S3
  2. Extract text using AWS Textract (for PDFs) or raw text (DOCX)
  3. Send text to Bedrock with parser prompt
  4. Extract structured data across 35 risk indicators
Output: JSON object with extracted fields mapped to database schema
{
  "financial": {
    "revenue_projection_year_1": "5000000 KES",
    "cost_structure": "70% COGS, 20% OpEx, 10% debt service",
    "credit_access": "Line of credit with KCB Bank, 2M limit",
    // ... 2 more fields
  },
  // ... 6 more categories
}
2

Gap Detector Agent

Input: Parsed data from Step 1 + expected field list (35 indicators)Process:
  1. Compare extracted data against required fields
  2. Validate data quality (completeness, format, confidence)
  3. Mark each field as VERIFIED, PARTIAL, or MISSING
Output: Array of GapField records inserted into database
[
  {
    category: 'FINANCIAL',
    field: 'revenue_projection_year_1',
    label: 'Revenue Projection - Year 1',
    extractedValue: '5000000 KES',
    status: 'VERIFIED',
    isMandatory: true
  },
  {
    category: 'FINANCIAL',
    field: 'liquidity_ratio',
    label: 'Current Liquidity Ratio',
    extractedValue: null,
    status: 'MISSING',
    isMandatory: true
  }
]
3

Risk Analysis Agent

Input: All gap fields (extracted + corrected values)Process:
  1. For each of 7 categories:
    • Score 5 subcategories (0-100 scale)
    • Map to risk level: 0-33 Low, 34-66 Medium, 67-100 High
    • Generate evidence text citing data sources
  2. Calculate weighted average for category score
  3. Generate narrative explaining risk drivers
  4. Create 3-5 prioritized recommendations per category
Output: 7 RiskScore records + associated Recommendation records
{
  category: 'FINANCIAL',
  score: 68.5,
  level: 'HIGH',
  subcategories: [
    { name: 'Revenue Stability', score: 72, level: 'HIGH', weight: 0.25 },
    { name: 'Cost Management', score: 65, level: 'MEDIUM', weight: 0.20 },
    { name: 'Credit Access', score: 60, level: 'MEDIUM', weight: 0.20 },
    { name: 'Liquidity', score: 78, level: 'HIGH', weight: 0.20 },
    { name: 'Capital Structure', score: 68, level: 'HIGH', weight: 0.15 }
  ],
  evidence: "Revenue projections show 40% dependency on single buyer...",
  narrative: "Financial risk is elevated due to revenue concentration and...",
  recommendations: [
    {
      text: "Diversify customer base to reduce concentration risk below 25% per customer",
      priority: 'HIGH'
    },
    // ... more recommendations
  ]
}
4

Report Generator Agent

Input: Assessment ID (fetches all data from database)Process:
  1. Query all gap fields, risk scores, recommendations, comments
  2. Generate HTML report template with:
    • Executive summary
    • Company profile
    • Risk scorecard (7 categories + 35 subcategories)
    • Evidence and narratives
    • Prioritized recommendations
    • Appendices (methodology, data sources)
  3. Convert HTML to PDF using headless Chrome (Puppeteer in Lambda)
  4. Upload PDF to S3
  5. Generate pre-signed download URL (1-hour expiry)
Output: S3 URI + pre-signed URL
{
  "s3Uri": "s3://alliance-risk-files/reports/assessment-abc123.pdf",
  "downloadUrl": "https://alliance-risk-files.s3.amazonaws.com/reports/assessment-abc123.pdf?X-Amz-Signature=...",
  "expiresAt": "2026-03-04T15:30:00Z"
}
Agent Chaining: The job system automatically chains PARSE_DOCUMENTGAP_DETECTION. Analysts manually trigger RISK_ANALYSIS and REPORT_GENERATION after reviewing gaps.

Asynchronous Job System

Fire-and-Forget Pattern: Job Handler Interface (packages/api/src/jobs/job-handler.interface.ts):
export interface JobHandler<TInput = unknown, TOutput = unknown> {
  execute(input: TInput): Promise<TOutput>;
  onFailure?(id: string, error: Error): Promise<void>;  // Optional cleanup
}
Retry Logic:
  • Max attempts: 3 (configurable per job)
  • Backoff: Exponential (1s, 2s, 4s)
  • Permanent failure: Status set to FAILED, error message stored
  • Document status updated: PARSINGFAILED

Infrastructure - AWS CloudFormation

Stack Resources (infra/lib/alliance-risk-stack.ts):
API Lambda:
  Runtime: nodejs20.x
  Architecture: arm64
  Memory: 1024 MB
  Timeout: 30 seconds
  Handler: lambda.handler
  VPC: Enabled (access to RDS)
  Policies: RDS, S3, Cognito, Lambda:InvokeFunction

Worker Lambda:
  Runtime: nodejs20.x
  Architecture: arm64
  Memory: 1024 MB
  Timeout: 15 minutes
  Handler: worker.handler
  VPC: Enabled (access to RDS)
  Policies: RDS, S3, Bedrock, Textract

API Gateway HTTP API:
  Timeout: 30 seconds
  CORS: Enabled
  Default Route: ANY /{proxy+} → API Lambda
Deployment Scripts:
# scripts/deploy-api.sh

# 1. Bundle API Lambda with esbuild
esbuild packages/api/src/lambda.ts \
  --bundle \
  --platform=node \
  --target=node20 \
  --external:@prisma/client \
  --external:pg \
  --outfile=dist/lambda.js

# 2. Bundle Worker Lambda
esbuild packages/api/src/worker.ts \
  --bundle \
  --platform=node \
  --target=node20 \
  --external:@prisma/client \
  --external:pg \
  --outfile=dist/worker.js

# 3. Copy external dependencies (Prisma, pg)
cp -r packages/api/node_modules/@prisma dist/node_modules/
cp -r packages/api/node_modules/pg dist/node_modules/

# 4. Zip and upload to S3
cd dist && zip -r lambda.zip . && cd ..
aws s3 cp dist/lambda.zip s3://alliance-risk-deploy/api-$(git rev-parse --short HEAD).zip

# 5. Update Lambda functions
aws lambda update-function-code \
  --function-name alliance-risk-api \
  --s3-bucket alliance-risk-deploy \
  --s3-key api-$(git rev-parse --short HEAD).zip

aws lambda update-function-code \
  --function-name alliance-risk-worker \
  --s3-bucket alliance-risk-deploy \
  --s3-key api-$(git rev-parse --short HEAD).zip

Data Flow Examples

Example 1: Document Upload and Parsing

Example 2: Risk Analysis and Scoring

// Frontend triggers risk analysis
const { mutate: triggerAnalysis } = useMutation({
  mutationFn: async (assessmentId: string) => {
    const response = await apiClient.post(`/api/assessments/${assessmentId}/analyze`);
    return response.data;
  },
  onSuccess: (data) => {
    // Start polling job
    pollJob(data.jobId);
  }
});

// Backend creates job and invokes worker
@Post(':id/analyze')
async triggerAnalysis(
  @Param('id') id: string,
  @CurrentUser() user: UserClaims
) {
  const jobId = await this.jobsService.create(
    JobType.RISK_ANALYSIS,
    { assessmentId: id },
    user.userId
  );
  return { jobId };
}

// Worker executes risk analysis handler
export class RiskAnalysisHandler implements JobHandler {
  async execute(input: { assessmentId: string }) {
    // 1. Fetch all gap fields from DB
    const gaps = await this.prisma.gapField.findMany({
      where: { assessmentId: input.assessmentId }
    });
    
    // 2. For each category, build prompt with data
    for (const category of RISK_CATEGORIES) {
      const categoryGaps = gaps.filter(g => g.category === category.key);
      
      const prompt = `
        Analyze ${category.label} for this business:
        ${categoryGaps.map(g => `${g.label}: ${g.correctedValue ?? g.extractedValue ?? 'MISSING'}`).join('\n')}
        
        Score each of 5 subcategories on 0-100 scale.
        Output JSON: {subcategories: [{name, score, level, rationale}], evidence, narrative}
      `;
      
      // 3. Call Bedrock
      const response = await this.bedrockService.converse({
        modelId: BEDROCK_MODELS.risk_analysis.modelId,
        systemPrompt: riskAnalysisPrompt.systemPrompt,
        userMessage: prompt
      });
      
      // 4. Parse response and calculate weighted score
      const parsed = JSON.parse(response.content);
      const categoryScore = parsed.subcategories.reduce(
        (sum, sub) => sum + (sub.score * sub.weight),
        0
      );
      
      // 5. Insert risk score
      await this.prisma.riskScore.create({
        data: {
          assessmentId: input.assessmentId,
          category: category.key,
          score: categoryScore,
          level: this.mapScoreToLevel(categoryScore),
          subcategories: parsed.subcategories,
          evidence: parsed.evidence,
          narrative: parsed.narrative
        }
      });
      
      // 6. Insert recommendations
      for (const rec of parsed.recommendations) {
        await this.prisma.recommendation.create({
          data: {
            riskScoreId: riskScore.id,
            text: rec.text,
            priority: rec.priority
          }
        });
      }
    }
    
    // 7. Calculate overall risk score (weighted average of 7 categories)
    const overallScore = await this.calculateOverallScore(input.assessmentId);
    
    // 8. Update assessment
    await this.prisma.assessment.update({
      where: { id: input.assessmentId },
      data: {
        overallRiskScore: overallScore,
        overallRiskLevel: this.mapScoreToLevel(overallScore),
        status: 'COMPLETE'
      }
    });
    
    return { overallScore, categories: RISK_CATEGORIES.length };
  }
  
  private mapScoreToLevel(score: number): RiskLevel {
    if (score < 33) return 'LOW';
    if (score < 67) return 'MODERATE';
    return 'HIGH';
  }
}

Performance Characteristics

Document Parsing

Typical: 30-60 seconds for 10-30 page PDFsVariables:
  • Document length (pages)
  • Text density
  • Textract processing time
  • Bedrock throttling
Timeout: 15 minutes (Worker Lambda)

Gap Detection

Typical: 10-20 secondsVariables:
  • Number of extracted fields
  • Validation complexity
Timeout: 15 minutes (Worker Lambda)

Risk Analysis

Typical: 60-90 seconds (7 categories × 10-15s each)Variables:
  • Bedrock API latency
  • Prompt complexity
  • Number of recommendations
Timeout: 15 minutes (Worker Lambda)

Report Generation

Typical: 20-30 secondsVariables:
  • Report length
  • PDF rendering complexity
  • S3 upload speed
Timeout: 15 minutes (Worker Lambda)
Cold Start Impact:
  • API Lambda: 1-2 seconds (arm64, 1024MB, bundled with esbuild)
  • Worker Lambda: 2-3 seconds (arm64, 1024MB, Prisma client generation)
  • Mitigation: Provisioned concurrency for production (not enabled in MVP)

Security Model

1

Authentication

  • AWS Cognito User Pool with email + password
  • JWT tokens (access: 60min, refresh: 30 days)
  • Token rotation on refresh
  • Rate limiting: 5 req/min on auth endpoints (NestJS Throttler)
2

Authorization

  • Global JwtAuthGuard on all API routes (except @Public())
  • JWT signature verification against Cognito JWKS
  • User roles via Cognito groups (admin group)
  • AdminGuard checks cognito:groups claim
  • Resource ownership validation (userId must match createdById)
3

Data Encryption

  • In Transit: HTTPS only (CloudFront, API Gateway, S3 pre-signed URLs)
  • At Rest: RDS encrypted with AWS KMS, S3 SSE-AES256
  • Secrets: Database credentials in Secrets Manager, auto-rotation enabled
4

Network Isolation

  • RDS in private VPC subnets (no internet gateway)
  • Lambda functions in VPC to access RDS
  • VPC endpoint for Cognito (interface endpoint)
  • S3 and Bedrock accessed via NAT Gateway (or VPC endpoints in production)
5

Input Validation

  • NestJS ValidationPipe with class-validator on all DTOs
  • File upload limits: 10MB max, PDF/DOCX only (MIME type validation)
  • SQL injection prevention: Prisma ORM (parameterized queries)
  • XSS prevention: React auto-escaping, CSP headers on CloudFront
Production Hardening Checklist (not implemented in MVP):
  • Enable CloudTrail for API audit logs
  • Add WAF rules to CloudFront (rate limiting, geo-blocking)
  • Implement RBAC beyond admin/analyst (e.g., viewer, editor roles)
  • Add MFA enforcement for admin accounts
  • Enable VPC Flow Logs
  • Set up CloudWatch alarms for error rates, latency, failed auth

Monitoring and Observability

CloudWatch Logs:
  • /aws/lambda/alliance-risk-api - API request/response, errors
  • /aws/lambda/alliance-risk-worker - Job processing, Bedrock calls, failures
  • /aws/rds/cluster/alliance-risk/postgresql - Slow queries, errors
Key Metrics:
  • Lambda invocations, duration, errors, throttles
  • RDS connections, CPU, memory, storage
  • Bedrock API latency, throttling, errors
  • S3 request counts, error rates
Cost Breakdown (estimated monthly for dev environment):
  • RDS db.t3.micro: $15
  • Lambda (API + Worker): $5-10 (low traffic)
  • Bedrock (Claude 3.5 Sonnet): $20-50 (100-200 assessments/month)
  • S3: $2
  • CloudFront: $5
  • Total: ~$50-80/month
Use AWS Cost Explorer to track Bedrock usage. Each assessment consumes approximately:
  • Parsing: 50K input tokens + 5K output tokens
  • Gap Detection: 10K input + 2K output
  • Risk Analysis: 30K input + 10K output (×7 categories)
  • Report Generation: 20K input + 15K output
Total per assessment: ~200K tokens = ~$0.30 at Claude 3.5 Sonnet v2 pricing

Development Workflow

# 1. Clone repository
git clone <repo-url>
cd alliance-risk-analysis-tool

# 2. Install dependencies
pnpm install

# 3. Set up local PostgreSQL
createdb alliance_risk

# 4. Configure API environment
cp packages/api/.env.example packages/api/.env
# Edit .env with local DATABASE_URL

# 5. Run migrations
pnpm --filter @alliance-risk/api exec prisma migrate deploy

# 6. Seed database
npx --prefix packages/api tsx prisma/seed.ts

# 7. Start dev servers (API :3001 + Web :3000)
pnpm dev

Scalability Considerations

Current Limits (MVP configuration):
  • RDS: 100 concurrent connections (db.t3.micro)
  • Lambda: 10 concurrent executions (soft limit, can increase)
  • Bedrock: 5 requests/second (default throttle per model)
  • S3: 5,500 GET/3,500 PUT per second per prefix (effectively unlimited)
Scaling Strategies (for production):
  • RDS: Upgrade to db.r6g.xlarge for 100+ concurrent users
  • Lambda: Increase reserved concurrency for API and Worker
  • Bedrock: Request quota increase (up to 1000 req/s per model)
  • CloudFront: No action needed, auto-scales globally
  • Database: Add indexes on assessmentId, userId, status columns
  • Caching: Add Redis/ElastiCache for session storage, prompt caching
  • CDN: Cache API responses for read-heavy endpoints (GET /api/prompts/section/:section)
  • Bedrock: Batch subcategory scoring (1 call instead of 5 per category)
  • Lambda: Use ARM64 (20% cheaper, already implemented)
  • RDS: Enable auto-pause for dev/staging (Aurora Serverless v2)
  • S3: Lifecycle policy to Glacier after 90 days
  • Bedrock: Switch to Claude 3 Haiku for parsing (5x cheaper, acceptable accuracy)

Build docs developers (and LLMs) love