generate_indexing_prompts
Generate domain-specific prompts for entity extraction, entity summarization, and community summarization.Parameters
GraphRAG configuration object. The function uses the configured input source to load documents for prompt generation.
The maximum number of text chunks to load from the input documents. Higher values provide more context but take longer to process.
The method for selecting document chunks:
DocSelectionType.RANDOM- Randomly select chunksDocSelectionType.TOP- Select the first chunksDocSelectionType.AUTO- Automatically select representative chunks using embeddings
The domain to map the input documents to (e.g., “medical research”, “legal documents”, “news articles”). If
None, the domain will be automatically detected from the documents.The language to use for the prompts (e.g., “English”, “Spanish”, “French”). If
None, the language will be automatically detected from the documents.The maximum number of tokens to use in entity extraction prompts. Controls the length and complexity of the generated prompts.
Whether to automatically discover entity types from the documents. When
True, the system analyzes your documents to identify relevant entity types. When False, generic entity types are used.The minimum number of examples required in entity extraction prompts. Higher values provide more guidance but make prompts longer.
The maximum number of text chunks to embed when using
DocSelectionType.AUTO selection method. Only relevant when selection_method is AUTO.The number of documents to select when using
DocSelectionType.AUTO selection method. Only relevant when selection_method is AUTO.Enable verbose logging output.
Returns
The entity extraction prompt. Use this prompt in the
extract_graph section of your GraphRAG configuration to guide entity and relationship extraction.The entity summarization prompt. Use this prompt in the
entity_summarization section of your configuration to guide how entities are summarized.The community summarization prompt. Use this prompt in the
community_summarization section of your configuration to guide how community reports are generated.DocSelectionType
Enum for document selection methods:Example: Basic usage
Example: Specify domain and language
Example: Auto-detect domain and language
Example: Fine-tune generation parameters
Using generated prompts
After generating prompts, update your GraphRAG configuration to use them:How it works
The prompt tuning process:- Document sampling - Loads a sample of your documents using the specified selection method
- Domain detection - Analyzes documents to identify the domain (if not specified)
- Language detection - Detects the language used in documents (if not specified)
- Persona generation - Creates a persona suitable for the domain
- Entity type discovery - Identifies relevant entity types from your documents (if enabled)
- Example generation - Creates example extractions from your documents
- Prompt assembly - Constructs complete prompts using the generated components
Selection methods
Random selection
Randomly samples chunks from your documents:Top selection
Selects the first chunks in document order:Auto selection
Uses embeddings to select diverse, representative chunks:Best practices
- Use AUTO selection for diverse document sets to ensure representative prompts
- Specify domain and language if you know them to save processing time
- Enable entity type discovery for domain-specific entity extraction
- Use more chunks (limit=20-30) for complex or diverse document sets
- Save prompts to files for version control and reproducibility
- Review generated prompts before using them in production
- Regenerate prompts when your document domain changes significantly
Related
- Index API - Build indexes with custom prompts
- Configuration - Configure prompt tuning settings
- CLI prompt tune - CLI alternative for prompt generation