Overview
The Knowledge Base synchronization feature allows you to sync PDF documents from your S3 bucket into Amazon Bedrock Knowledge Base. This enables the AI agent to retrieve and reference information from your uploaded documents when answering questions.Make sure you have configured AWS credentials and S3 setup before setting up Knowledge Base synchronization.
Required Parameters
The AWS region where your Bedrock Knowledge Base is deployed.Example:
us-east-1, us-west-2Placeholder: us-east-1This should typically be the same region as your S3 bucket and Bedrock agent.
The unique identifier for your Bedrock Knowledge Base.Example:
ABCD12EFGHThis is the Knowledge Base ID from the Amazon Bedrock console.The identifier for the data source within your Knowledge Base.Example:
XYZ789MNOPThis connects to the specific S3 data source in your Knowledge Base configuration.Optional description for the synchronization job.Default:
Sincronización manual desde UIThis description appears in AWS CloudWatch logs and can help you identify manual syncs from the UI.Configuration Steps
Open Sync Configuration Dialog
Click the gear icon (⚙) next to “Sincronización de Knowledge Base” in the right sidebar panel.The dialog titled “Configuración de sincronización KB” will open.
Enter Knowledge Base Details
Fill in the Knowledge Base configuration fields:From
src/pages/index.astro:1048-1053:Save Configuration
Click the “Guardar” button to save your Knowledge Base settings.The configuration is persisted to browser localStorage.
Creating a Knowledge Base
If you don’t have a Knowledge Base yet:Open Bedrock Console
Navigate to the Amazon Bedrock Console.
Navigate to Knowledge Bases
In the left sidebar, click Knowledge bases under the Orchestration section.
Create Knowledge Base
Click Create knowledge base button.
- Name: Give your knowledge base a descriptive name
- Description: Optional description of what documents this KB contains
- IAM role: Choose an existing role or create a new one with required permissions
Configure Data Source
- Data source name: Name your S3 data source
- S3 URI: Enter your bucket URI (e.g.,
s3://workshop-docs-bucket/documentos/) - Chunking strategy: Choose how documents are split (default or custom)
- Embeddings model: Select a model for generating embeddings (e.g.,
amazon.titan-embed-text-v1)
Complete Setup
- Review your configuration
- Click Create knowledge base
- Wait for the knowledge base to be created
Running a Synchronization
Once configured, you can sync your S3 documents to the Knowledge Base:Start Sync Job
In the “Sincronización de Knowledge Base” panel, click the “Ejecutar sincronización” button.The application creates a new ingestion job in AWS.
Monitor Progress
The sync status updates automatically:
- PENDIENTE: Job is being created
- EN_EJECUCION: Ingestion is in progress
- COMPLETADO: Sync finished successfully
- FALLIDO: Sync failed with errors
src/pages/api/sync.ts:8:View Logs
Each execution appears as a collapsible section showing:
- Job ID
- Current status
- Start time
- Completion time (when finished)
- Detailed logs of the sync process
How Synchronization Works
Starting a Sync Job
When you click “Ejecutar sincronización”, the application:- Creates an execution record with a unique ID
- Initiates an ingestion job via AWS API
- Polls the job status every 4 seconds
- Updates logs in real-time
src/pages/api/sync.ts:56-62:
Polling for Status Updates
The sync process continuously checks job status: Fromsrc/pages/api/sync.ts:74-87:
Status Mapping
AWS statuses are mapped to UI-friendly statuses: Fromsrc/pages/api/sync-history.ts:16-21:
Retrieving Sync History
The refresh button fetches recent jobs from AWS: Fromsrc/pages/api/sync-history.ts:51-57:
Execution History
The sync panel displays execution history with two types of entries:Local Executions
- Created when you start a sync from the UI
- Stored in localStorage
- Show real-time progress with detailed logs
- Identified without “AWS” prefix
AWS Executions
- Retrieved from AWS Bedrock Knowledge Base API
- Historical jobs that may have been started elsewhere
- Show summary information
- Identified with “AWS” prefix in the job title
src/pages/index.astro:1452:
Execution Details
Each execution record contains: Fromsrc/pages/api/sync.ts:10-16:
Log Messages
Logs include timestamps and status updates: Fromsrc/pages/api/sync.ts:32-36:
- “Ejecución creada.”
- “Iniciando tarea de sincronización en Bedrock Knowledge Base…”
- “Ingestion Job iniciado: [job-id]”
- Multiple “Estado actual: [status]” updates
- “Sincronización completada correctamente.” or error message
Validation Logic
The sync section is enabled only when properly configured: Fromsrc/pages/index.astro:1530-1531:
src/pages/index.astro:1557-1558:
Managing Configuration
Clear Sync Configuration
To remove Knowledge Base settings:- Open the sync configuration dialog
- Click “Borrar configuración”
- This clears region, Knowledge Base ID, and Data Source ID from localStorage
Cancel Without Saving
To close the dialog without applying changes:- Click “Cancelar sin guardar”
- The dialog closes and previous sync settings remain unchanged
Troubleshooting
Sync Button Disabled
Symptoms: “Ejecutar sincronización” button is grayed out. Solutions:- Verify AWS credentials are configured
- Ensure Knowledge Base ID and Data Source ID are provided
- Check the configuration summary panel
- Verify region is specified
”Faltan parámetros de sincronización” Error
Symptoms: API returns 400 error when starting sync. Solutions:- Re-open sync configuration dialog
- Verify all required fields are filled
- Ensure AWS credentials are configured
- Save configuration again
”No se recibió un ingestionJobId en la respuesta” Error
Symptoms: Sync starts but fails immediately with this error. Solutions:- Verify Knowledge Base ID and Data Source ID are correct
- Check that the Knowledge Base exists in the specified region
- Ensure the Data Source ID belongs to the specified Knowledge Base
- Check AWS console for Knowledge Base status
Sync Stuck in “EN_EJECUCION”
Symptoms: Sync status shows “EN_EJECUCION” for extended period. Solutions:- Large document sets can take 10-30+ minutes to process
- Check AWS Bedrock console for actual job status
- Review CloudWatch logs for detailed progress
- If truly stuck, the job may have failed - check AWS console
”No se encontró la ejecución solicitada” Error
Symptoms: Error when checking execution status. Solutions:- The execution may have been started in a different browser session
- Refresh the page to reload execution history
- Click the refresh button (↻) to fetch from AWS
Permission Denied Errors
Symptoms: Sync fails with access denied or permission error. Solutions:- Check IAM user/role has
bedrock:StartIngestionJobpermission - Verify IAM user has
bedrock:GetIngestionJobpermission - Ensure IAM user has
bedrock:ListIngestionJobspermission - Check Knowledge Base IAM role has S3 read permissions
Required IAM Permissions
Your IAM user or role needs these Bedrock permissions:Understanding Ingestion Jobs
What Happens During Sync
- Discovery: Bedrock scans the S3 data source for new/changed files
- Processing: Documents are parsed and split into chunks
- Embedding: Each chunk is converted to vector embeddings
- Indexing: Embeddings are stored in the Knowledge Base vector store
- Completion: Knowledge Base is ready to answer questions about the documents
Sync Duration
Ingestion time depends on:- Number of documents
- Total size of documents
- Complexity of content (text, tables, images)
- Chunking strategy
- Embeddings model selected
- 1-5 PDFs: 1-3 minutes
- 10-50 PDFs: 5-15 minutes
- 100+ PDFs: 30-60+ minutes
Best Practices
Sync After Uploads
Run a sync job after uploading new documents to make them available to the AI agent.
Monitor Sync Logs
Review sync logs to understand processing and catch any errors early.
Schedule Regular Syncs
For production use, consider scheduling automatic syncs via AWS EventBridge.
Optimize Document Size
Keep PDFs reasonably sized (< 50MB each) for faster processing.
Advanced: Automatic Sync
For production deployments, consider setting up automatic synchronization:- S3 Event Notifications: Configure S3 to trigger Lambda on new uploads
- Lambda Function: Create a function to call
StartIngestionJob - EventBridge Schedule: Set up periodic syncs (e.g., daily at 2 AM)
Next Steps
- Learn about Using the Chat Interface with your Knowledge Base
- Understand Document Management Best Practices
- Explore Monitoring and Debugging