Complete Workflow

Overview

The Connector Generator follows a multi-stage pipeline to transform API documentation into working connector code. Each stage builds upon the results of the previous stages, with all state tracked in a session.

Stage 1: Session Creation

Every workflow begins by creating a session to track all processing state.

curl -X POST "http://localhost:8000/session" \
  -H "Content-Type: application/json"

Store the sessionId - you’ll use it in all subsequent API calls throughout the workflow.

Stage 2: Documentation Acquisition

You have two options for providing API documentation to the system:

Option A: Upload Documentation
Option B: Discovery + Scraping

Upload OpenAPI/Swagger files directly:

curl -X POST "http://localhost:8000/session/{session_id}/documentation" \
  -F "[email protected]"

When to use:

You have existing OpenAPI/Swagger specifications
Documentation is available as downloadable files
You want the fastest processing time

Result: Returns a job ID for monitoring the chunking and processing progress.

Let the system discover and scrape documentation from the web:

Step 2a: Discovery

curl -X POST "http://localhost:8000/discovery/{session_id}/discovery" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "Salesforce",
    "applicationVersion": "v1",
    "customSearchQuery": "Salesforce API documentation"
  }'

What it does: Searches for and identifies candidate documentation URLs.

Step 2b: Scraping

curl -X POST "http://localhost:8000/scrape/{session_id}/scrape" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "Salesforce",
    "applicationVersion": "v1",
    "candidateLinks": ["https://developer.salesforce.com/docs/..."],
    "maxIterations": 10,
    "maxDocuments": 50
  }'

What it does: Crawls the identified URLs, extracts relevant content, and processes it into documentation chunks.When to use:

Documentation is web-based without downloadable specs
You want comprehensive coverage across multiple pages
Source documentation is scattered across multiple URLs

Stage 3: Schema Extraction (Digester)

Once documentation is loaded, extract structured schema information:

Extract Object Classes

Identify and extract all object types (resources) from the documentation:

curl -X POST "http://localhost:8000/digester/{session_id}/digester" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "Salesforce",
    "applicationVersion": "v1",
    "instructionsForSorter": "Focus on core business objects like Account, Contact, Lead",
    "instructionsForFilter": "Exclude internal or deprecated objects"
  }'

What gets extracted:

Object Classes

Business entities like User, Account, Order with their properties and types

Attributes

Field definitions including data types, required/optional status, and constraints

Relationships

Connections between objects (foreign keys, references, hierarchies)

Operations

Available CRUD operations and custom endpoints for each object

Monitor Extraction Progress

The digester processes documentation through multiple sub-stages:

curl -X GET "http://localhost:8000/digester/{session_id}/digester?jobId={job_id}"

{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "running",
  "progress": {
    "stage": "sorting",
    "message": "Identifying object classes in documentation"
  }
}

Retrieve Extracted Schema

Access the extracted schema from session data:

curl -X GET "http://localhost:8000/session/{session_id}"

Stage 4: Code Generation

Generate connector code from the extracted schema:

curl -X POST "http://localhost:8000/codegen/{session_id}/codegen" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "Salesforce",
    "applicationVersion": "v1",
    "objectClassNames": ["Account", "Contact", "Lead"],
    "connectorName": "salesforce-connector"
  }'

Code Generation Options

objectClassNames

array

List of object class names to include. If not specified, generates code for all extracted objects.

connectorName

string

Custom name for the generated connector. Defaults to applicationName.

includeTests

boolean

default:"true"

Whether to generate test files alongside the connector code.

Monitor Generation Progress

curl -X GET "http://localhost:8000/codegen/{session_id}/codegen?jobId={job_id}"

Stage 5: Retrieve Generated Code

Download the complete connector code:

curl -X GET "http://localhost:8000/session/{session_id}"

Complete Workflow Example

Here’s a complete example workflow using cURL:

#!/bin/bash

# Step 1: Create session
SESSION_RESPONSE=$(curl -s -X POST "http://localhost:8000/session")
SESSION_ID=$(echo $SESSION_RESPONSE | jq -r '.sessionId')
echo "Created session: $SESSION_ID"

# Step 2: Upload documentation
DOC_RESPONSE=$(curl -s -X POST \
  "http://localhost:8000/session/$SESSION_ID/documentation" \
  -F "[email protected]")
DOC_JOB_ID=$(echo $DOC_RESPONSE | jq -r '.jobId')
echo "Documentation processing job: $DOC_JOB_ID"

# Wait for documentation processing
while true; do
  STATUS=$(curl -s "http://localhost:8000/session/$SESSION_ID/jobs" | \
    jq -r ".jobs[] | select(.jobId==\"$DOC_JOB_ID\") | .status")
  echo "Documentation status: $STATUS"
  [[ "$STATUS" == "finished" ]] && break
  sleep 5
done

# Step 3: Extract schema
DIGEST_RESPONSE=$(curl -s -X POST \
  "http://localhost:8000/digester/$SESSION_ID/digester" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "MyAPI",
    "applicationVersion": "v1"
  }')
DIGEST_JOB_ID=$(echo $DIGEST_RESPONSE | jq -r '.jobId')
echo "Schema extraction job: $DIGEST_JOB_ID"

# Wait for schema extraction
while true; do
  STATUS=$(curl -s "http://localhost:8000/digester/$SESSION_ID/digester?jobId=$DIGEST_JOB_ID" | \
    jq -r '.status')
  echo "Schema extraction status: $STATUS"
  [[ "$STATUS" == "finished" ]] && break
  sleep 5
done

# Step 4: Generate code
CODEGEN_RESPONSE=$(curl -s -X POST \
  "http://localhost:8000/codegen/$SESSION_ID/codegen" \
  -H "Content-Type: application/json" \
  -d '{
    "applicationName": "MyAPI",
    "applicationVersion": "v1",
    "connectorName": "myapi-connector"
  }')
CODEGEN_JOB_ID=$(echo $CODEGEN_RESPONSE | jq -r '.jobId')
echo "Code generation job: $CODEGEN_JOB_ID"

# Wait for code generation
while true; do
  STATUS=$(curl -s "http://localhost:8000/codegen/$SESSION_ID/codegen?jobId=$CODEGEN_JOB_ID" | \
    jq -r '.status')
  echo "Code generation status: $STATUS"
  [[ "$STATUS" == "finished" ]] && break
  sleep 5
done

# Step 5: Retrieve generated code
curl -s "http://localhost:8000/session/$SESSION_ID" | \
  jq -r '.data.generatedCode' > generated_connector.json

echo "Connector generated successfully!"

Job Dependencies

Understanding job dependencies is crucial for workflow orchestration:

Important: Each stage requires the previous stage to complete successfully. Always check job status before proceeding to the next stage.

Workflow Optimization

Result Caching

The system automatically caches job results to avoid redundant processing:

{
  "usePreviousSessionData": true
}

When enabled:

Documentation processing: Reuses processed chunks from previous uploads of the same content
Schema extraction: Reuses identified object classes if documentation hasn’t changed
Code generation: Reuses generated code structures for unchanged schemas

Result caching can reduce processing time by 70-90% for repeated operations with similar inputs. The system checks for compatible previous jobs within a configurable time window (default: 24 hours for digester, 7 days for discovery).

Parallel Processing

Some stages support parallel execution:

Multiple documentation files can be uploaded simultaneously
Schema extraction for different object classes runs in parallel
Code generation for multiple object classes is parallelized

Error Recovery

If a job fails:

Check error details from the job status endpoint
Review session state to identify which stage failed
Fix the issue (e.g., provide better instructions, adjust parameters)
Retry from the failed stage - no need to restart the entire workflow

curl -X GET "http://localhost:8000/digester/{session_id}/digester?jobId={job_id}"

Best Practices

Monitor Job Progress

Poll job status endpoints every 5-10 seconds during processing. Jobs may take several minutes depending on documentation size and complexity.

Provide Clear Instructions

When using the digester, provide clear instructionsForSorter and instructionsForFilter to guide the extraction process. Specific instructions yield better results.

Validate Intermediate Results

Check the extracted schema before proceeding to code generation. Review objectClasses in the session data to ensure all expected objects were found.

Use Appropriate maxIterations

For scraping, start with conservative maxIterations (10-20). Increase only if coverage is insufficient. Higher values increase processing time and costs.

Clean Up Sessions

Delete sessions after downloading generated code to free storage. Sessions can accumulate significant data.

Sessions

Understand session management and data structure

Job Status

Learn about job lifecycle and progress tracking

Get Started

Core Concepts

Guides

Overview

Stage 1: Session Creation

Stage 2: Documentation Acquisition

Step 2a: Discovery

Step 2b: Scraping

Stage 3: Schema Extraction (Digester)

Extract Object Classes

Monitor Extraction Progress

Retrieve Extracted Schema

Stage 4: Code Generation

Code Generation Options

Monitor Generation Progress

Stage 5: Retrieve Generated Code

Complete Workflow Example

Job Dependencies

Workflow Optimization

Result Caching

Parallel Processing

Error Recovery

Best Practices

Sessions

Job Status

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Stage 1: Session Creation

​Stage 2: Documentation Acquisition

​Step 2a: Discovery

​Step 2b: Scraping

​Stage 3: Schema Extraction (Digester)

​Extract Object Classes

​Monitor Extraction Progress

​Retrieve Extracted Schema

​Stage 4: Code Generation

​Code Generation Options

​Monitor Generation Progress

​Stage 5: Retrieve Generated Code

​Complete Workflow Example

​Job Dependencies

​Workflow Optimization

​Result Caching

​Parallel Processing

​Error Recovery

​Best Practices

​Related Concepts

Sessions

Job Status

Build docs developers (and LLMs) love

Overview

Stage 1: Session Creation

Stage 2: Documentation Acquisition

Step 2a: Discovery

Step 2b: Scraping

Stage 3: Schema Extraction (Digester)

Extract Object Classes

Monitor Extraction Progress

Retrieve Extracted Schema

Stage 4: Code Generation

Code Generation Options

Monitor Generation Progress

Stage 5: Retrieve Generated Code

Complete Workflow Example

Job Dependencies

Workflow Optimization

Result Caching

Parallel Processing

Error Recovery

Best Practices

Related Concepts