Skip to main content
Add research-powered enrichment columns to Extruct company tables.

Trigger Phrases

“enrich”, “add column”, “add data point”, “research column”, “enrich table”, “enrichment”, “add a field”, “run enrichment”, “monitor enrichment”

Environment

VariableService
EXTRUCT_API_TOKENExtruct API
Before making API calls, check that EXTRUCT_API_TOKEN is set by running test -n "$EXTRUCT_API_TOKEN" && echo "set" || echo "missing". If missing, ask the user to provide their Extruct API token and set it via export EXTRUCT_API_TOKEN=<value>. Do not proceed until confirmed.
Base URL: https://api.extruct.ai/v1

Official API Reference

Workflow

1

Verify API reference

  1. Read local reference: references/api_reference.md
  2. Fetch live docs: https://www.extruct.ai/docs
  3. Compare endpoints, params, and response fields
  4. If discrepancies found:
    • Update the local reference file
    • Flag changes to the user before proceeding
  5. Proceed with the skill workflow
2

Confirm the table

Get the table ID from the user (URL or ID). Fetch table metadata via GET /tables/{table_id}. Show the user: table name, row count, existing columns.
3

Get column configs

Two paths:Path A: From enrichment-design - User has column_configs ready. Confirm and proceed.Path B: Design on the fly - Confirm with the user:
  1. What data point? - what to research (e.g. “funding stage”, “primary vertical”, “tech stack”)
  2. Output format - pick the right format:
FormatWhen to useExtra params
textFree-form research output-
number / moneyNumeric data (revenue, headcount)-
selectSingle choice from known categorieslabels: [...]
multiselectMultiple tags from known categorieslabels: [...]
jsonStructured multi-field dataoutput_schema: {...}
grade1-5 score-
labelSingle tag from listlabels: [...]
dateDate values-
url / email / phoneContact info-
  1. Agent type - default research_pro. Use llm when no web research needed (classification from existing profile data).
4

Write the prompt

Craft a clear prompt using {input} for the row’s domain value. Prompt guidelines:
  • Be specific about what to find
  • Specify the exact output format in the prompt (e.g. “Return ONLY a number in millions USD”)
  • Include fallback instruction (e.g. “If not found, return N/A”)
  • For select/multiselect, the labels constrain the output - the prompt should guide which label to pick
5

Create the column(s)

Create columns via POST /tables/{table_id}/columns with the column_configs array.
6

Trigger enrichment (only the new columns)

Run via POST /tables/{table_id}/run with { "mode": "new", "columns": [new_column_ids] }.
Always scope the run to the newly created column(s) only. Avoid broad or implicit run payloads when you only intend to enrich specific columns.
Report: run ID, rows queued, and table URL.
7

Monitor progress

Poll the table data via GET /tables/{table_id}/data every 30 seconds. For each row, check the status field of the relevant column cells (done, pending, failed).Show the user:
  • Current % complete (done cells / total cells)
  • Number of failed cells (if any)
  • Estimated time remaining (based on rate so far)
Stop polling when all cells are done or failed.
8

Quality spot-check

After enrichment completes (or after 50%+ is done), fetch a sample of 5-10 enriched rows and display for review.Present to user as a table. Ask:
  • “Does the data quality look right?”
  • “Any columns returning garbage or N/A too often?”
  • “Should we adjust any prompts and re-run?”
If quality issues are found:
  1. Delete the problematic column
  2. Adjust the prompt
  3. Re-create and re-run

Output Formats Reference

FormatDescriptionUse CaseExtra Params
textFree-form textOpen-ended research, descriptions-
numberNumeric valueRevenue, headcount, counts-
moneyCurrency amountFunding, ARR, deal size-
selectSingle choiceCategories, stages, maturitylabels: [...]
multiselectMultiple choicesTags, industries, use caseslabels: [...]
jsonStructured dataMultiple related fieldsoutput_schema: {...}
grade1-5 ratingFit scores, quality ratings-
labelSingle tagStatus, tier, segmentlabels: [...]
dateDate valueLaunch dates, funding dates-
urlURLLinkedIn, website, blog-
emailEmail addressContact emails-
phonePhone numberContact phones-

Agent Types

Agent TypeWhen to Use
research_proNeeds web research (funding, news, launches, tech stack)
llmClassification from existing company profile (no web research needed)
research_reasoningNuanced judgment requiring chain-of-thought (fit scoring, maturity assessment)
linkedinPeople/org structure data from LinkedIn

Example Column Configs

{
  "kind": "agent",
  "name": "Funding Stage",
  "key": "funding_stage",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the latest funding stage for {input}. Choose from: Seed, Series A, Series B, Series C+, Bootstrapped, Unknown. Return only the label.",
    "output_format": "select",
    "labels": ["Seed", "Series A", "Series B", "Series C+", "Bootstrapped", "Unknown"]
  }
}
{
  "kind": "agent",
  "name": "Recent Launch",
  "key": "recent_launch",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find any product or feature launched by {input} in the last 6 months. Include product name, launch date, and brief description. If none found, return 'N/A'.",
    "output_format": "text"
  }
}
{
  "kind": "agent",
  "name": "Hypothesis 1 Fit",
  "key": "h1_fit",
  "value": {
    "agent_type": "research_reasoning",
    "prompt": "Grade how well {input} matches this hypothesis: 'Fragmented partner discovery - relies on events and intros to find non-indexed suppliers'. Look for: (1) fragmented supplier base, (2) limited online presence of targets, (3) reliance on manual outreach. Return 1-5 where 5 = perfect fit.",
    "output_format": "grade"
  }
}

Quality Checks

After enrichment completes, always perform a quality spot-check:
  • Sample 5-10 rows
  • Check for high N/A rates (>30% suggests prompt issues)
  • Check for irrelevant data (suggests agent type mismatch)
  • Check for format mismatches (suggests output_format mismatch)

Build docs developers (and LLMs) love