Quick Start
Create a new research domain in three simple steps:Initialize the domain
Use the This creates a new directory at
just init command to create a new domain from the template:configs/soviet_afghan_war/ with template configuration files.Configure entity types
Edit the YAML files in
configs/soviet_afghan_war/categories/ to define entity types relevant to your research:people.yaml- Types of people (military_leaders, diplomats, commanders)organizations.yaml- Organization types (military_units, intelligence_agencies)locations.yaml- Location types (provinces, military_bases, refugee_camps)events.yaml- Event types (battles, negotiations, refugee_movements)
Customize extraction prompts
Edit the markdown files in
configs/soviet_afghan_war/prompts/ to provide domain-specific extraction instructions:people.md- How to identify and categorize peopleorganizations.md- How to extract organizationslocations.md- How to identify locationsevents.md- How to extract eventsrelevance.md- How to determine if a source is relevant
Domain Structure
Each domain contains the following structure:Configuring Entity Types
Entity types are defined in YAML files undercategories/. Each file defines the types and tags available for that entity category.
Example: People Categories
configs/guantanamo/categories/people.yaml:
Best practices for entity types:
- Use lowercase, underscore-separated names (e.g.,
military_leader) - Provide clear descriptions that distinguish similar types
- Include 2-3 realistic examples from your domain
- Focus on types that matter for your research questions
Template Structure
All category files follow this structure:Customizing Extraction Prompts
Prompts are markdown files that instruct the AI model how to extract entities from your sources. They should be specific to your research domain and source types.Example: People Extraction Prompt
configs/guantanamo/prompts/people.md (excerpt):
Advanced Configuration
See the Configuration Reference for details on:- Deduplication thresholds per entity type
- Name variant equivalence groups
- Performance and concurrency settings
- Caching configuration
- Embedding model selection
Domain Examples
Guantánamo Bay Research
Focus: Detention, legal proceedings, human rights Key entity types:- People: detainee, military, lawyer, journalist
- Organizations: military, intelligence, legal, humanitarian
- Locations: detention_facility, military_base
- Events: detention, legal proceedings, policy changes
Historical Food Studies
Focus: Food history, agricultural practices, culinary traditions Key entity types:- People: farmers, traders, cookbook_authors, anthropologists
- Organizations: agricultural_cooperatives, food_companies, markets
- Locations: farms, markets, kitchens, trade_routes
- Events: harvests, famines, recipe_documentation, trade_agreements
Conflict Studies
Focus: Military history, geopolitical events Key entity types:- People: military_leaders, diplomats, commanders, journalists
- Organizations: military_units, intelligence_agencies, tribal_groups
- Locations: provinces, military_bases, refugee_camps
- Events: battles, negotiations, refugee_movements
Testing Your Configuration
After creating your domain, test it with a small number of articles:Managing Multiple Domains
List all available domains:- Entity type definitions
- Extraction prompts
- Processing settings
- Output directory
--domain flag or by using the web interface domain selector.
Next Steps
Process Articles
Learn how to process your historical sources and extract entities
Configuration Reference
Complete reference for config.yaml settings
Data Format
Prepare your sources in the required Parquet format
Web Interface
Browse and explore extracted entities