Skip to main content

Overview

The Connector Generator follows a multi-stage pipeline to transform API documentation into working connector code. Each stage builds upon the results of the previous stages, with all state tracked in a session.

Stage 1: Session Creation

Every workflow begins by creating a session to track all processing state.
curl -X POST "http://localhost:8000/session" \
  -H "Content-Type: application/json"
Store the sessionId - you’ll use it in all subsequent API calls throughout the workflow.

Stage 2: Documentation Acquisition

You have two options for providing API documentation to the system:
Upload OpenAPI/Swagger files directly:
curl -X POST "http://localhost:8000/session/{session_id}/documentation" \
  -F "[email protected]"
When to use:
  • You have existing OpenAPI/Swagger specifications
  • Documentation is available as downloadable files
  • You want the fastest processing time
Result: Returns a job ID for monitoring the chunking and processing progress.

Stage 3: Schema Extraction (Digester)

Once documentation is loaded, extract structured schema information:

Extract Object Classes

Identify and extract all object types (resources) from the documentation:
curl -X POST "http://localhost:8000/digester/{session_id}/digester" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "Salesforce",
    "applicationVersion": "v1",
    "instructionsForSorter": "Focus on core business objects like Account, Contact, Lead",
    "instructionsForFilter": "Exclude internal or deprecated objects"
  }'
What gets extracted:
Business entities like User, Account, Order with their properties and types
Field definitions including data types, required/optional status, and constraints
Connections between objects (foreign keys, references, hierarchies)
Available CRUD operations and custom endpoints for each object

Monitor Extraction Progress

The digester processes documentation through multiple sub-stages:
curl -X GET "http://localhost:8000/digester/{session_id}/digester?jobId={job_id}"
{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "running",
  "progress": {
    "stage": "sorting",
    "message": "Identifying object classes in documentation"
  }
}

Retrieve Extracted Schema

Access the extracted schema from session data:
curl -X GET "http://localhost:8000/session/{session_id}"

Stage 4: Code Generation

Generate connector code from the extracted schema:
curl -X POST "http://localhost:8000/codegen/{session_id}/codegen" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "Salesforce",
    "applicationVersion": "v1",
    "objectClassNames": ["Account", "Contact", "Lead"],
    "connectorName": "salesforce-connector"
  }'

Code Generation Options

objectClassNames
array
List of object class names to include. If not specified, generates code for all extracted objects.
connectorName
string
Custom name for the generated connector. Defaults to applicationName.
includeTests
boolean
default:"true"
Whether to generate test files alongside the connector code.

Monitor Generation Progress

curl -X GET "http://localhost:8000/codegen/{session_id}/codegen?jobId={job_id}"

Stage 5: Retrieve Generated Code

Download the complete connector code:
curl -X GET "http://localhost:8000/session/{session_id}"

Complete Workflow Example

Here’s a complete example workflow using cURL:
#!/bin/bash

# Step 1: Create session
SESSION_RESPONSE=$(curl -s -X POST "http://localhost:8000/session")
SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.sessionId')
echo "Created session: $SESSION_ID"

# Step 2: Upload documentation
DOC_RESPONSE=$(curl -s -X POST \
  "http://localhost:8000/session/$SESSION_ID/documentation" \
  -F "[email protected]")
DOC_JOB_ID=$(echo $DOC_RESPONSE | jq -r '.jobId')
echo "Documentation processing job: $DOC_JOB_ID"

# Wait for documentation processing
while true; do
  STATUS=$(curl -s "http://localhost:8000/session/$SESSION_ID/jobs" | \
    jq -r ".jobs[] | select(.jobId==\"$DOC_JOB_ID\") | .status")
  echo "Documentation status: $STATUS"
  [[ "$STATUS" == "finished" ]] && break
  sleep 5
done

# Step 3: Extract schema
DIGEST_RESPONSE=$(curl -s -X POST \
  "http://localhost:8000/digester/$SESSION_ID/digester" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "MyAPI",
    "applicationVersion": "v1"
  }')
DIGEST_JOB_ID=$(echo $DIGEST_RESPONSE | jq -r '.jobId')
echo "Schema extraction job: $DIGEST_JOB_ID"

# Wait for schema extraction
while true; do
  STATUS=$(curl -s "http://localhost:8000/digester/$SESSION_ID/digester?jobId=$DIGEST_JOB_ID" | \
    jq -r '.status')
  echo "Schema extraction status: $STATUS"
  [[ "$STATUS" == "finished" ]] && break
  sleep 5
done

# Step 4: Generate code
CODEGEN_RESPONSE=$(curl -s -X POST \
  "http://localhost:8000/codegen/$SESSION_ID/codegen" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "MyAPI",
    "applicationVersion": "v1",
    "connectorName": "myapi-connector"
  }')
CODEGEN_JOB_ID=$(echo $CODEGEN_RESPONSE | jq -r '.jobId')
echo "Code generation job: $CODEGEN_JOB_ID"

# Wait for code generation
while true; do
  STATUS=$(curl -s "http://localhost:8000/codegen/$SESSION_ID/codegen?jobId=$CODEGEN_JOB_ID" | \
    jq -r '.status')
  echo "Code generation status: $STATUS"
  [[ "$STATUS" == "finished" ]] && break
  sleep 5
done

# Step 5: Retrieve generated code
curl -s "http://localhost:8000/session/$SESSION_ID" | \
  jq -r '.data.generatedCode' > generated_connector.json

echo "Connector generated successfully!"

Job Dependencies

Understanding job dependencies is crucial for workflow orchestration:
Important: Each stage requires the previous stage to complete successfully. Always check job status before proceeding to the next stage.

Workflow Optimization

Result Caching

The system automatically caches job results to avoid redundant processing:
{
  "usePreviousSessionData": true
}
When enabled:
  • Documentation processing: Reuses processed chunks from previous uploads of the same content
  • Schema extraction: Reuses identified object classes if documentation hasn’t changed
  • Code generation: Reuses generated code structures for unchanged schemas
Result caching can reduce processing time by 70-90% for repeated operations with similar inputs. The system checks for compatible previous jobs within a configurable time window (default: 24 hours for digester, 7 days for discovery).

Parallel Processing

Some stages support parallel execution:
  • Multiple documentation files can be uploaded simultaneously
  • Schema extraction for different object classes runs in parallel
  • Code generation for multiple object classes is parallelized

Error Recovery

If a job fails:
  1. Check error details from the job status endpoint
  2. Review session state to identify which stage failed
  3. Fix the issue (e.g., provide better instructions, adjust parameters)
  4. Retry from the failed stage - no need to restart the entire workflow
curl -X GET "http://localhost:8000/digester/{session_id}/digester?jobId={job_id}"

Best Practices

Poll job status endpoints every 5-10 seconds during processing. Jobs may take several minutes depending on documentation size and complexity.
When using the digester, provide clear instructionsForSorter and instructionsForFilter to guide the extraction process. Specific instructions yield better results.
Check the extracted schema before proceeding to code generation. Review objectClasses in the session data to ensure all expected objects were found.
For scraping, start with conservative maxIterations (10-20). Increase only if coverage is insufficient. Higher values increase processing time and costs.
Delete sessions after downloading generated code to free storage. Sessions can accumulate significant data.

Sessions

Understand session management and data structure

Job Status

Learn about job lifecycle and progress tracking

Build docs developers (and LLMs) love