Skip to main content

Overview

Domain commands help you manage separate research projects (“domains”), each with its own entity types, extraction prompts, and data outputs. Each domain is isolated from others.

Commands

just init

Initialize a new domain configuration by copying the template.
just init <domain_name>
domain_name
string
required
Name for the new domain. Must be alphanumeric (letters and numbers only, no spaces or special characters).Examples:
  • guantanamo
  • vietnam_war ❌ (contains underscore)
  • coldwar
  • my-project ❌ (contains hyphen)
Source: scripts/init_domain.py

Usage Examples

just init coldwar

What Gets Created

When you run just init <domain>, the system:
1

Copy Template

Copies the entire configs/template/ directory to configs/<domain>/.
2

Process Template Files

Finds all .template files and:
  1. Replaces {DOMAIN_NAME} with your domain name
  2. Replaces {DOMAIN_DESCRIPTION} with auto-generated description
  3. Removes .template extension
3

Success Message

Prints confirmation and next steps.

Directory Structure

After running just init myproject, you’ll have:
configs/myproject/
├── config.yaml              # Main domain configuration
├── prompts/
│   ├── people.txt          # Entity extraction prompts
│   ├── events.txt
│   ├── locations.txt
│   └── organizations.txt
├── types/
│   ├── people.yaml         # Entity type schemas
│   ├── events.yaml
│   ├── locations.yaml
│   └── organizations.yaml
└── tags/
    ├── people_tags.yaml    # Tag taxonomies
    ├── event_tags.yaml
    ├── location_tags.yaml
    └── organization_tags.yaml

Configuration Files

Main Config: config.yaml

Defines domain-level settings:
configs/myproject/config.yaml
name: myproject
description: Articles and analysis related to myproject

data_path: data/myproject/raw_sources/articles.parquet
output_dir: data/myproject/output

# Entity types to extract
entity_types:
  - people
  - events
  - locations
  - organizations

# Merge thresholds per entity type
thresholds:
  people:
    lexical_match: 0.85
    embedding_similarity: 0.82
    # ... more settings

# Processing concurrency
concurrency:
  extract_workers: 3
  extract_per_article: 4
  llm_in_flight: 10

Entity Type Schemas: types/*.yaml

Define fields for each entity type:
types/people.yaml
name: Person
fields:
  - name: name
    type: string
    required: true
    description: Full name of the person
  
  - name: role
    type: string
    description: Primary role or occupation
  
  - name: dates
    type: string
    description: Relevant dates (birth, death, active years)

Extraction Prompts: prompts/*.txt

LLM prompts for entity extraction:
prompts/people.txt
You are analyzing historical documents to extract information about people.

For each person mentioned, extract:
- Full name (as written in the document)
- Role or occupation
- Key dates mentioned
- Brief profile summarizing their involvement

Be precise and only extract information explicitly stated in the text.

Tag Taxonomies: tags/*_tags.yaml

Controlled vocabularies for categorization:
tags/people_tags.yaml
categories:
  - name: Military
    description: Military personnel
    values:
      - Soldier
      - Officer
      - General
  
  - name: Political
    description: Political figures
    values:
      - President
      - Senator
      - Ambassador

Listing Domains

just domains

List all available domain configurations with descriptions.
just domains
Source: scripts/list_domains.py

Output Example

$ just domains

Available domains:
 guantanamo: Guantanamo Bay detention facility articles and detainee information
 coldwar: Cold War era events, figures, and geopolitical analysis
 vietnam: Vietnam War historical records and personnel data
 template: Template domain for creating new configurations

Domain Information

For each domain, the command shows:
  • Name: Domain identifier used in commands
  • Description: From config.yaml in the domain directory
If a domain config file is malformed, you’ll see a warning instead of the description.

Working with Domains

After Creating a Domain

1

Customize Configuration

Edit configs/<domain>/config.yaml to set:
  • Data paths
  • Output directories
  • Merge thresholds
  • Concurrency settings
2

Customize Entity Types

Edit files in configs/<domain>/types/ to define the fields you need for each entity type.
3

Write Extraction Prompts

Customize prompts in configs/<domain>/prompts/ to guide the LLM on what information to extract and how.
4

Define Tag Taxonomies

Edit configs/<domain>/tags/ to create controlled vocabularies for categorizing entities.
5

Add Source Data

Place your articles Parquet file at the data_path specified in config.yaml.
6

Start Processing

Run just process-domain <domain> to begin extraction.

Switching Between Domains

Process different domains by specifying the --domain flag:
just process --domain guantanamo --limit 10

Domain Isolation

Each domain is completely isolated:
AspectIsolation
ConfigSeparate configs/<domain>/ directory
DataSeparate data/<domain>/ directory
EntitiesSeparate Parquet files per domain
Processing statusSeparate processing_status.json per domain
CacheSeparate extraction_cache/ per domain
Entity merging only happens within a domain. Entities from different domains are never merged together.

Advanced Configuration

Custom Data Paths

Override the default data path for a domain:
just process --domain myproject --articles-path /custom/path/articles.parquet

Per-Domain Concurrency

Tune processing speed per domain:
configs/high-volume/config.yaml
concurrency:
  extract_workers: 8        # More parallel article processors
  extract_per_article: 4
  llm_in_flight: 20        # More concurrent API calls

Per-Domain Thresholds

Adjust entity merging sensitivity:
configs/strict-matching/config.yaml
thresholds:
  people:
    lexical_match: 0.90     # Higher = stricter matching
    embedding_similarity: 0.88

Template Domain

The template domain in configs/template/ serves as the blueprint for new domains. To modify the template:
  1. Edit files in configs/template/
  2. Use {DOMAIN_NAME} and {DOMAIN_DESCRIPTION} placeholders
  3. Add .template extension to files that need placeholder replacement
Example template file:
configs/template/config.yaml.template
name: {DOMAIN_NAME}
description: {DOMAIN_DESCRIPTION}

data_path: data/{DOMAIN_NAME}/raw_sources/articles.parquet
output_dir: data/{DOMAIN_NAME}/output

Error Reference

Cause: Domain name contains spaces, hyphens, underscores, or special characters.Solution: Use only letters and numbers (e.g., coldwar, vietnam1965).
Cause: A directory with that domain name already exists in configs/.Solution: Choose a different name or delete the existing domain directory if you want to recreate it.
Cause: configs/template/ is missing from the repository.Solution: Ensure you have a complete clone of the repository. The template should be in configs/template/.
Cause: Permissions error or disk space issue.Solution: Check that you have write permissions in the configs/ directory and sufficient disk space.

Command Reference Summary

CommandPurposeExample
just init <name>Create new domainjust init coldwar
just domainsList all domainsjust domains
just process --domain <name>Process specific domainjust process --domain coldwar
just process-domain <name>Shortcut for domain processingjust process-domain coldwar

See Also

Build docs developers (and LLMs) love