Knowledge Base Configuration - Workshop Cloud Chat

Overview

The Knowledge Base synchronization feature allows you to sync PDF documents from your S3 bucket into Amazon Bedrock Knowledge Base. This enables the AI agent to retrieve and reference information from your uploaded documents when answering questions.

Make sure you have configured AWS credentials and S3 setup before setting up Knowledge Base synchronization.

Required Parameters

region

string

required

The AWS region where your Bedrock Knowledge Base is deployed.Example: us-east-1, us-west-2Placeholder: us-east-1

This should typically be the same region as your S3 bucket and Bedrock agent.

knowledgeBaseId

string

required

The unique identifier for your Bedrock Knowledge Base.Example: ABCD12EFGHThis is the Knowledge Base ID from the Amazon Bedrock console.

dataSourceId

string

required

The identifier for the data source within your Knowledge Base.Example: XYZ789MNOPThis connects to the specific S3 data source in your Knowledge Base configuration.

description

string

default:"Sincronización manual desde UI"

Optional description for the synchronization job.Default: Sincronización manual desde UIThis description appears in AWS CloudWatch logs and can help you identify manual syncs from the UI.

Configuration Steps

Open Sync Configuration Dialog

Click the gear icon (⚙) next to “Sincronización de Knowledge Base” in the right sidebar panel.The dialog titled “Configuración de sincronización KB” will open.

Enter Knowledge Base Details

Fill in the Knowledge Base configuration fields:

Región (requerido): us-east-1
Knowledge Base ID (requerido): ABCD12EFGH
Data Source ID (requerido): XYZ789MNOP
Descripción: Sincronización manual desde UI

From src/pages/index.astro:1048-1053:

<label>Región (requerido)<input name="region" placeholder="us-east-1" required /></label>
<label>Knowledge Base ID (requerido)<input name="knowledgeBaseId" required /></label>
<label>Data Source ID (requerido)<input name="dataSourceId" required /></label>
<label>Descripción<input name="description" value="Sincronización manual desde UI" /></label>

Save Configuration

Click the “Guardar” button to save your Knowledge Base settings.The configuration is persisted to browser localStorage.

Verify Configuration

Check the “Resumen de configuración” panel. You should see:

KB Sync: us-east-1 · KB ABCD12EFGH · DS XYZ789MNOP

From src/pages/index.astro:1336-1338:

el.summarySync.textContent = state.sync.knowledgeBaseId
  ? `KB Sync: ${state.sync.region} · KB ${state.sync.knowledgeBaseId} · DS ${state.sync.dataSourceId}`
  : 'KB Sync: sin configurar.';

Creating a Knowledge Base

If you don’t have a Knowledge Base yet:

Open Bedrock Console

Navigate to the Amazon Bedrock Console.

Navigate to Knowledge Bases

In the left sidebar, click Knowledge bases under the Orchestration section.

Create Knowledge Base

Click Create knowledge base button.

Name: Give your knowledge base a descriptive name
Description: Optional description of what documents this KB contains
IAM role: Choose an existing role or create a new one with required permissions

Configure Data Source

Data source name: Name your S3 data source
S3 URI: Enter your bucket URI (e.g., s3://workshop-docs-bucket/documentos/)
Chunking strategy: Choose how documents are split (default or custom)
Embeddings model: Select a model for generating embeddings (e.g., amazon.titan-embed-text-v1)

Complete Setup

Review your configuration
Click Create knowledge base
Wait for the knowledge base to be created

Note IDs

After creation:

Copy the Knowledge base ID from the details page
Navigate to Data sources tab
Copy the Data source ID for your S3 data source

Running a Synchronization

Once configured, you can sync your S3 documents to the Knowledge Base:

Start Sync Job

In the “Sincronización de Knowledge Base” panel, click the “Ejecutar sincronización” button.The application creates a new ingestion job in AWS.

Monitor Progress

The sync status updates automatically:

PENDIENTE: Job is being created
EN_EJECUCION: Ingestion is in progress
COMPLETADO: Sync finished successfully
FALLIDO: Sync failed with errors

From src/pages/api/sync.ts:8:

type SyncStatus = 'PENDIENTE' | 'EN_EJECUCION' | 'COMPLETADO' | 'FALLIDO';

View Logs

Each execution appears as a collapsible section showing:

Job ID
Current status
Start time
Completion time (when finished)
Detailed logs of the sync process

Click the section to expand and view full logs.

Refresh History

Click the refresh icon (↻) to fetch the latest execution history from AWS.This retrieves up to 40 recent ingestion jobs from the Bedrock Knowledge Base service.

How Synchronization Works

Starting a Sync Job

When you click “Ejecutar sincronización”, the application:

Creates an execution record with a unique ID
Initiates an ingestion job via AWS API
Polls the job status every 4 seconds
Updates logs in real-time

From src/pages/api/sync.ts:56-62:

const startResponse = await client.send(
  new StartIngestionJobCommand({
    knowledgeBaseId: payload.knowledgeBaseId,
    dataSourceId: payload.dataSourceId,
    description: payload.description || 'Sincronización ejecutada desde la interfaz de chat'
  })
);

Polling for Status Updates

The sync process continuously checks job status: From src/pages/api/sync.ts:74-87:

while (currentStatus === 'STARTING' || currentStatus === 'IN_PROGRESS') {
  await wait(4000);

  const statusResponse = await client.send(
    new GetIngestionJobCommand({
      knowledgeBaseId: payload.knowledgeBaseId,
      dataSourceId: payload.dataSourceId,
      ingestionJobId
    })
  );

  currentStatus = statusResponse.ingestionJob?.status;
  addLog(executionId, `Estado actual: ${currentStatus || 'DESCONOCIDO'}`);
}

Status Mapping

AWS statuses are mapped to UI-friendly statuses: From src/pages/api/sync-history.ts:16-21:

const mapAwsStatusToUiStatus = (status?: string): UiSyncStatus => {
  if (status === 'COMPLETE') return 'COMPLETADO';
  if (status === 'FAILED') return 'FALLIDO';
  if (status === 'IN_PROGRESS' || status === 'STARTING') return 'EN_EJECUCION';
  return 'PENDIENTE';
};

Retrieving Sync History

The refresh button fetches recent jobs from AWS: From src/pages/api/sync-history.ts:51-57:

const listResponse = await client.send(
  new ListIngestionJobsCommand({
    knowledgeBaseId: payload.knowledgeBaseId,
    dataSourceId: payload.dataSourceId,
    maxResults
  })
);

Execution History

The sync panel displays execution history with two types of entries:

Local Executions

Created when you start a sync from the UI
Stored in localStorage
Show real-time progress with detailed logs
Identified without “AWS” prefix

AWS Executions

Retrieved from AWS Bedrock Knowledge Base API
Historical jobs that may have been started elsewhere
Show summary information
Identified with “AWS” prefix in the job title

From src/pages/index.astro:1452:

const sourceLabel = exec.source ? `${exec.source} · ` : '';

Execution Details

Each execution record contains: From src/pages/api/sync.ts:10-16:

type SyncExecution = {
  id: string;
  status: SyncStatus;
  logs: string[];
  startedAt: string;
  finishedAt?: string;
};

Log Messages

Logs include timestamps and status updates: From src/pages/api/sync.ts:32-36:

const addLog = (executionId: string, message: string): void => {
  const execution = executions.get(executionId);
  if (!execution) return;
  execution.logs.push(`[${new Date().toLocaleString('es-ES')}] ${message}`);
};

Typical log sequence:

“Ejecución creada.”
“Iniciando tarea de sincronización en Bedrock Knowledge Base…”
“Ingestion Job iniciado: [job-id]”
Multiple “Estado actual: [status]” updates
“Sincronización completada correctamente.” or error message

Validation Logic

The sync section is enabled only when properly configured: From src/pages/index.astro:1530-1531:

const isSyncConfigured = () =>
  Boolean(state.sync.region && state.sync.knowledgeBaseId && state.sync.dataSourceId && isAwsConfigured());

Both the sync and refresh buttons are disabled when configuration is missing: From src/pages/index.astro:1557-1558:

el.startSync.disabled = !syncReady;
el.refreshSyncHistory.disabled = !syncReady;

Managing Configuration

Clear Sync Configuration

To remove Knowledge Base settings:

Open the sync configuration dialog
Click “Borrar configuración”
This clears region, Knowledge Base ID, and Data Source ID from localStorage

Cancel Without Saving

To close the dialog without applying changes:

Click “Cancelar sin guardar”
The dialog closes and previous sync settings remain unchanged

Troubleshooting

Sync Button Disabled

Symptoms: “Ejecutar sincronización” button is grayed out. Solutions:

Verify AWS credentials are configured
Ensure Knowledge Base ID and Data Source ID are provided
Check the configuration summary panel
Verify region is specified

”Faltan parámetros de sincronización” Error

Symptoms: API returns 400 error when starting sync. Solutions:

Re-open sync configuration dialog
Verify all required fields are filled
Ensure AWS credentials are configured
Save configuration again

”No se recibió un ingestionJobId en la respuesta” Error

Symptoms: Sync starts but fails immediately with this error. Solutions:

Verify Knowledge Base ID and Data Source ID are correct
Check that the Knowledge Base exists in the specified region
Ensure the Data Source ID belongs to the specified Knowledge Base
Check AWS console for Knowledge Base status

Sync Stuck in “EN_EJECUCION”

Symptoms: Sync status shows “EN_EJECUCION” for extended period. Solutions:

Large document sets can take 10-30+ minutes to process
Check AWS Bedrock console for actual job status
Review CloudWatch logs for detailed progress
If truly stuck, the job may have failed - check AWS console

”No se encontró la ejecución solicitada” Error

Symptoms: Error when checking execution status. Solutions:

The execution may have been started in a different browser session
Refresh the page to reload execution history
Click the refresh button (↻) to fetch from AWS

Permission Denied Errors

Symptoms: Sync fails with access denied or permission error. Solutions:

Check IAM user/role has bedrock:StartIngestionJob permission
Verify IAM user has bedrock:GetIngestionJob permission
Ensure IAM user has bedrock:ListIngestionJobs permission
Check Knowledge Base IAM role has S3 read permissions

Required IAM Permissions

Your IAM user or role needs these Bedrock permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:StartIngestionJob",
        "bedrock:GetIngestionJob",
        "bedrock:ListIngestionJobs"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1:123456789012:knowledge-base/*"
      ]
    }
  ]
}

Replace us-east-1 and 123456789012 with your actual region and AWS account ID.

The Knowledge Base service role needs S3 permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name/*",
        "arn:aws:s3:::your-bucket-name"
      ]
    }
  ]
}

Understanding Ingestion Jobs

What Happens During Sync

Discovery: Bedrock scans the S3 data source for new/changed files
Processing: Documents are parsed and split into chunks
Embedding: Each chunk is converted to vector embeddings
Indexing: Embeddings are stored in the Knowledge Base vector store
Completion: Knowledge Base is ready to answer questions about the documents

Sync Duration

Ingestion time depends on:

Number of documents
Total size of documents
Complexity of content (text, tables, images)
Chunking strategy
Embeddings model selected

Typical times:

1-5 PDFs: 1-3 minutes
10-50 PDFs: 5-15 minutes
100+ PDFs: 30-60+ minutes

Best Practices

Sync After Uploads

Run a sync job after uploading new documents to make them available to the AI agent.

Monitor Sync Logs

Review sync logs to understand processing and catch any errors early.

Schedule Regular Syncs

For production use, consider scheduling automatic syncs via AWS EventBridge.

Optimize Document Size

Keep PDFs reasonably sized (< 50MB each) for faster processing.

Advanced: Automatic Sync

For production deployments, consider setting up automatic synchronization:

S3 Event Notifications: Configure S3 to trigger Lambda on new uploads
Lambda Function: Create a function to call StartIngestionJob
EventBridge Schedule: Set up periodic syncs (e.g., daily at 2 AM)

This ensures documents are always synchronized without manual intervention.

Next Steps

Learn about Using the Chat Interface with your Knowledge Base
Understand Document Management Best Practices
Explore Monitoring and Debugging

Get Started

Features

Configuration

Deployment

​Overview

​Required Parameters

​Configuration Steps

​Creating a Knowledge Base

​Running a Synchronization

​How Synchronization Works

​Starting a Sync Job

​Polling for Status Updates

​Status Mapping

​Retrieving Sync History

​Execution History

​Local Executions

​AWS Executions

​Execution Details

​Log Messages

​Validation Logic

​Managing Configuration

​Clear Sync Configuration

​Cancel Without Saving

​Troubleshooting

​Sync Button Disabled

​”Faltan parámetros de sincronización” Error

​”No se recibió un ingestionJobId en la respuesta” Error

​Sync Stuck in “EN_EJECUCION”

​”No se encontró la ejecución solicitada” Error

​Permission Denied Errors

​Required IAM Permissions

​Understanding Ingestion Jobs

​What Happens During Sync

​Sync Duration

​Best Practices

Sync After Uploads

Monitor Sync Logs

Schedule Regular Syncs

Optimize Document Size

​Advanced: Automatic Sync

​Next Steps

Build docs developers (and LLMs) love

Overview

Required Parameters

Configuration Steps

Creating a Knowledge Base

Running a Synchronization

How Synchronization Works

Starting a Sync Job

Polling for Status Updates

Status Mapping

Retrieving Sync History

Execution History

Local Executions

AWS Executions

Execution Details

Log Messages

Validation Logic

Managing Configuration

Clear Sync Configuration

Cancel Without Saving

Troubleshooting

Sync Button Disabled

”Faltan parámetros de sincronización” Error

”No se recibió un ingestionJobId en la respuesta” Error

Sync Stuck in “EN_EJECUCION”

”No se encontró la ejecución solicitada” Error

Permission Denied Errors

Required IAM Permissions

Understanding Ingestion Jobs

What Happens During Sync

Sync Duration

Best Practices

Advanced: Automatic Sync

Next Steps