Domain Management Commands

Overview

Domain commands help you manage separate research projects (“domains”), each with its own entity types, extraction prompts, and data outputs. Each domain is isolated from others.

Commands

`just init`

Initialize a new domain configuration by copying the template.

just init <domain_name>

domain_name

string

required

Name for the new domain. Must be alphanumeric (letters and numbers only, no spaces or special characters).Examples:

guantanamo ✅
vietnam_war ❌ (contains underscore)
coldwar ✅
my-project ❌ (contains hyphen)

Source: scripts/init_domain.py

Usage Examples

just init coldwar

What Gets Created

When you run just init <domain>, the system:

Copy Template

Copies the entire configs/template/ directory to configs/<domain>/.

Process Template Files

Finds all .template files and:

Replaces {DOMAIN_NAME} with your domain name
Replaces {DOMAIN_DESCRIPTION} with auto-generated description
Removes .template extension

Success Message

Prints confirmation and next steps.

Directory Structure

After running just init myproject, you’ll have:

configs/myproject/
├── config.yaml              # Main domain configuration
├── prompts/
│   ├── people.txt          # Entity extraction prompts
│   ├── events.txt
│   ├── locations.txt
│   └── organizations.txt
├── types/
│   ├── people.yaml         # Entity type schemas
│   ├── events.yaml
│   ├── locations.yaml
│   └── organizations.yaml
└── tags/
    ├── people_tags.yaml    # Tag taxonomies
    ├── event_tags.yaml
    ├── location_tags.yaml
    └── organization_tags.yaml

Configuration Files

Main Config: `config.yaml`

Defines domain-level settings:

configs/myproject/config.yaml

name: myproject
description: Articles and analysis related to myproject

data_path: data/myproject/raw_sources/articles.parquet
output_dir: data/myproject/output

# Entity types to extract
entity_types:
  - people
  - events
  - locations
  - organizations

# Merge thresholds per entity type
thresholds:
  people:
    lexical_match: 0.85
    embedding_similarity: 0.82
    # ... more settings

# Processing concurrency
concurrency:
  extract_workers: 3
  extract_per_article: 4
  llm_in_flight: 10

Entity Type Schemas: `types/*.yaml`

Define fields for each entity type:

types/people.yaml

name: Person
fields:
  - name: name
    type: string
    required: true
    description: Full name of the person
  
  - name: role
    type: string
    description: Primary role or occupation
  
  - name: dates
    type: string
    description: Relevant dates (birth, death, active years)

Extraction Prompts: `prompts/*.txt`

LLM prompts for entity extraction:

prompts/people.txt

You are analyzing historical documents to extract information about people.

For each person mentioned, extract:
- Full name (as written in the document)
- Role or occupation
- Key dates mentioned
- Brief profile summarizing their involvement

Be precise and only extract information explicitly stated in the text.

Tag Taxonomies: `tags/*_tags.yaml`

Controlled vocabularies for categorization:

tags/people_tags.yaml

categories:
  - name: Military
    description: Military personnel
    values:
      - Soldier
      - Officer
      - General
  
  - name: Political
    description: Political figures
    values:
      - President
      - Senator
      - Ambassador

Listing Domains

`just domains`

List all available domain configurations with descriptions.

just domains

Source: scripts/list_domains.py

Output Example

$ just domains

Available domains:
  • guantanamo: Guantanamo Bay detention facility articles and detainee information
  • coldwar: Cold War era events, figures, and geopolitical analysis
  • vietnam: Vietnam War historical records and personnel data
  • template: Template domain for creating new configurations

Domain Information

For each domain, the command shows:

Name: Domain identifier used in commands
Description: From config.yaml in the domain directory

If a domain config file is malformed, you’ll see a warning instead of the description.

Working with Domains

After Creating a Domain

Customize Configuration

Edit configs/<domain>/config.yaml to set:

Data paths
Output directories
Merge thresholds
Concurrency settings

Customize Entity Types

Edit files in configs/<domain>/types/ to define the fields you need for each entity type.

Write Extraction Prompts

Customize prompts in configs/<domain>/prompts/ to guide the LLM on what information to extract and how.

Define Tag Taxonomies

Edit configs/<domain>/tags/ to create controlled vocabularies for categorizing entities.

Add Source Data

Place your articles Parquet file at the data_path specified in config.yaml.

Start Processing

Run just process-domain <domain> to begin extraction.

Switching Between Domains

Process different domains by specifying the --domain flag:

just process --domain guantanamo --limit 10

Domain Isolation

Each domain is completely isolated:

Aspect	Isolation
Config	Separate `configs/<domain>/` directory
Data	Separate `data/<domain>/` directory
Entities	Separate Parquet files per domain
Processing status	Separate `processing_status.json` per domain
Cache	Separate `extraction_cache/` per domain

Entity merging only happens within a domain. Entities from different domains are never merged together.

Advanced Configuration

Custom Data Paths

Override the default data path for a domain:

just process --domain myproject --articles-path /custom/path/articles.parquet

Per-Domain Concurrency

Tune processing speed per domain:

configs/high-volume/config.yaml

concurrency:
  extract_workers: 8        # More parallel article processors
  extract_per_article: 4
  llm_in_flight: 20        # More concurrent API calls

Per-Domain Thresholds

Adjust entity merging sensitivity:

configs/strict-matching/config.yaml

thresholds:
  people:
    lexical_match: 0.90     # Higher = stricter matching
    embedding_similarity: 0.88

Template Domain

The template domain in configs/template/ serves as the blueprint for new domains. To modify the template:

Edit files in configs/template/
Use {DOMAIN_NAME} and {DOMAIN_DESCRIPTION} placeholders
Add .template extension to files that need placeholder replacement

Example template file:

configs/template/config.yaml.template

name: {DOMAIN_NAME}
description: {DOMAIN_DESCRIPTION}

data_path: data/{DOMAIN_NAME}/raw_sources/articles.parquet
output_dir: data/{DOMAIN_NAME}/output

Error Reference

Domain name must be alphanumeric

Cause: Domain name contains spaces, hyphens, underscores, or special characters.Solution: Use only letters and numbers (e.g., coldwar, vietnam1965).

Domain already exists

Cause: A directory with that domain name already exists in configs/.Solution: Choose a different name or delete the existing domain directory if you want to recreate it.

Template directory not found

Cause: configs/template/ is missing from the repository.Solution: Ensure you have a complete clone of the repository. The template should be in configs/template/.

Failed to initialize domain

Cause: Permissions error or disk space issue.Solution: Check that you have write permissions in the configs/ directory and sufficient disk space.

Command Reference Summary

Command	Purpose	Example
`just init <name>`	Create new domain	`just init coldwar`
`just domains`	List all domains	`just domains`
`just process --domain <name>`	Process specific domain	`just process --domain coldwar`
`just process-domain <name>`	Shortcut for domain processing	`just process-domain coldwar`

CLI

Engine

Utilities

Domain Management Commands

Overview

Commands

`just init`

Usage Examples

What Gets Created

Directory Structure

Configuration Files

Main Config: `config.yaml`

Entity Type Schemas: `types/*.yaml`

Extraction Prompts: `prompts/*.txt`

Tag Taxonomies: `tags/*_tags.yaml`

Listing Domains

`just domains`

Output Example

Domain Information

Working with Domains

After Creating a Domain

Switching Between Domains

Domain Isolation

Advanced Configuration

Custom Data Paths

Per-Domain Concurrency

Per-Domain Thresholds

Template Domain

Error Reference

Command Reference Summary

See Also

Build docs developers (and LLMs) love

CLI

Engine

Utilities

​Overview

​Commands

​just init

​Usage Examples

​What Gets Created

​Directory Structure

​Configuration Files

​Main Config: config.yaml

​Entity Type Schemas: types/*.yaml

​Extraction Prompts: prompts/*.txt

​Tag Taxonomies: tags/*_tags.yaml

​Listing Domains

​just domains

​Output Example

​Domain Information

​Working with Domains

​After Creating a Domain

​Switching Between Domains

​Domain Isolation

​Advanced Configuration

​Custom Data Paths

​Per-Domain Concurrency

​Per-Domain Thresholds

​Template Domain

​Error Reference

​Command Reference Summary

​See Also

Build docs developers (and LLMs) love

Overview

Commands

`just init`

Usage Examples

What Gets Created

Directory Structure

Configuration Files

Main Config: `config.yaml`

Entity Type Schemas: `types/*.yaml`

Extraction Prompts: `prompts/*.txt`

Tag Taxonomies: `tags/*_tags.yaml`

Listing Domains

`just domains`

Output Example

Domain Information

Working with Domains

After Creating a Domain

Switching Between Domains

Domain Isolation

Advanced Configuration

Custom Data Paths

Per-Domain Concurrency

Per-Domain Thresholds

Template Domain

Error Reference

Command Reference Summary

See Also