Root configuration
The rootGraphRagConfig class contains all top-level configuration settings.
Available completion model configurations. Maps model IDs to their respective configurations.
Available embedding model configurations. Maps model IDs to their respective configurations.
The default number of concurrent requests to make to language models.
The default asynchronous mode to use for language model requests. See AsyncType enum.
The input configuration for document sources.
The input storage configuration. For file storage,
base_dir defaults to input.The chunking configuration to use. See chunking configuration.
The output storage configuration. For file storage,
base_dir defaults to output.The output configuration for the updated index. For file storage,
base_dir defaults to update_output.The table provider configuration. By default, parquet files are read/written to disk. You can register custom output table storage.
The cache configuration for storing LLM responses and intermediate results.
The reporting configuration. See reporting configuration.
The vector store configuration. Defaults to LanceDB with
db_uri set to output/lancedb.List of workflows to run, in execution order. This always overrides any built-in workflow methods.
Indexing configuration
Text embedding
Text embedding configuration.Fields:
embedding_model_id(str): The model ID to use for text embeddings. Default:"default_embedding_model"model_instance_name(str): The model singleton instance name. Default:"text_embedding"batch_size(int): The batch size to use. Default:16batch_max_tokens(int): The batch max tokens to use. Default:8191names(list[str]): The specific embeddings to perform.
Graph extraction
The entity extraction configuration to use.Fields:
completion_model_id(str): The model ID to use. Default:"default_completion_model"model_instance_name(str): The model singleton instance name. Default:"extract_graph"prompt(str | None): The entity extraction prompt to use. Default:Noneentity_types(list[str]): The entity extraction entity types to use. Default:["organization", "person", "geo", "event"]max_gleanings(int): The maximum number of entity gleanings to use. Default:1
The NLP-based graph extraction configuration to use. Used for fast indexing mode.Fields:
normalize_edge_weights(bool): Whether to normalize edge weights. Default:Truetext_analyzer(TextAnalyzerDefaults): Text analyzer configurationconcurrent_requests(int): Number of concurrent requests. Default:25async_mode(AsyncType): Async mode to use. Default:AsyncType.Threaded
Description summarization
The description summarization configuration to use.Fields:
prompt(str | None): The summarization prompt. Default:Nonemax_length(int): Maximum length in tokens. Default:500max_input_tokens(int): Maximum input tokens. Default:4000completion_model_id(str): Model ID to use. Default:"default_completion_model"model_instance_name(str): Model instance name. Default:"summarize_descriptions"
Graph processing
The graph pruning configuration to use.Fields:
min_node_freq(int): Minimum node frequency. Default:2max_node_freq_std(None): Maximum node frequency standard deviation. Default:Nonemin_node_degree(int): Minimum node degree. Default:1max_node_degree_std(None): Maximum node degree standard deviation. Default:Nonemin_edge_weight_pct(float): Minimum edge weight percentage. Default:40.0remove_ego_nodes(bool): Whether to remove ego nodes. Default:Truelcc_only(bool): Keep only largest connected component. Default:False
The cluster graph configuration to use.Fields:
max_cluster_size(int): The maximum cluster size to use. Default:10use_lcc(bool): Whether to use the largest connected component. Default:Trueseed(int): The seed to use for the clustering. Default:0xDEADBEEF
Claims extraction
The claim extraction configuration to use.Fields:
enabled(bool): Whether claim extraction is enabled. Default:Falseprompt(str | None): The extraction prompt. Default:Nonedescription(str): Description of claims to extract. Default:"Any claims or facts that could be relevant to information discovery."max_gleanings(int): Maximum number of gleanings. Default:1completion_model_id(str): Model ID to use. Default:"default_completion_model"model_instance_name(str): Model instance name. Default:"extract_claims"
Community reports
The community reports configuration to use.Fields:
completion_model_id(str): The model ID to use. Default:"default_completion_model"model_instance_name(str): The model instance name. Default:"community_reporting"graph_prompt(str | None): Prompt for graph-based summarization. Default:Nonetext_prompt(str | None): Prompt for text-based summarization. Default:Nonemax_length(int): Maximum length in tokens. Default:2000max_input_length(int): Maximum input length in tokens. Default:8000
Snapshots
The snapshots configuration to use.Fields:
embeddings(bool): Whether to save embedding snapshots. Default:Falsegraphml(bool): Whether to save GraphML snapshots. Default:Falseraw_graph(bool): Whether to save raw graph snapshots. Default:False
Search configuration
Local search
The local search configuration.Fields:
prompt(str | None): The local search prompt to use. Default:Nonecompletion_model_id(str): Model ID to use. Default:"default_completion_model"embedding_model_id(str): Model ID for embeddings. Default:"default_embedding_model"text_unit_prop(float): The text unit proportion. Default:0.5community_prop(float): The community proportion. Default:0.15conversation_history_max_turns(int): Maximum conversation turns. Default:5top_k_entities(int): Top k mapped entities. Default:10top_k_relationships(int): Top k mapped relations. Default:10max_context_tokens(int): Maximum tokens. Default:12000
Global search
The global search configuration.Fields:
map_prompt(str | None): The global search mapper prompt. Default:Nonereduce_prompt(str | None): The global search reducer prompt. Default:Nonecompletion_model_id(str): Model ID to use. Default:"default_completion_model"knowledge_prompt(str | None): The global search general prompt. Default:Nonemax_context_tokens(int): Maximum context size in tokens. Default:12000data_max_tokens(int): Data LLM maximum tokens. Default:12000map_max_length(int): Map LLM max response length in words. Default:1000reduce_max_length(int): Reduce LLM max response length in words. Default:2000dynamic_search_threshold(int): Rating threshold to include a community. Default:1dynamic_search_keep_parent(bool): Keep parent if child communities relevant. Default:Falsedynamic_search_num_repeats(int): Number of times to rate same report. Default:1dynamic_search_use_summary(bool): Use community summary instead of full context. Default:Falsedynamic_search_max_level(int): Maximum community hierarchy level. Default:2
DRIFT search
The DRIFT search configuration.Fields:
prompt(str | None): The DRIFT search prompt. Default:Nonereduce_prompt(str | None): The reduce prompt. Default:Nonedata_max_tokens(int): Maximum data tokens. Default:12000reduce_max_tokens(None): Maximum reduce tokens. Default:Nonereduce_temperature(float): Reduce temperature. Default:0reduce_max_completion_tokens(None): Max completion tokens. Default:Noneconcurrency(int): Concurrency level. Default:32drift_k_followups(int): Number of followup queries. Default:20primer_folds(int): Number of primer folds. Default:5primer_llm_max_tokens(int): Primer LLM max tokens. Default:12000n_depth(int): Search depth. Default:3- Additional local search parameters for DRIFT’s local search component
completion_model_id(str): Model ID to use. Default:"default_completion_model"embedding_model_id(str): Embedding model ID. Default:"default_embedding_model"
Basic search
The basic search configuration.Fields:
prompt(None): The basic search prompt. Default:Nonek(int): Number of results to return. Default:10max_context_tokens(int): Maximum context tokens. Default:12000completion_model_id(str): Model ID to use. Default:"default_completion_model"embedding_model_id(str): Embedding model ID. Default:"default_embedding_model"
Helper methods
TheGraphRagConfig class provides helper methods to retrieve model configurations:
get_completion_model_config
model_id(str): The ID of the model to get. Should match an ID in thecompletion_modelslist.
ModelConfig: The model configuration if found.
ValueError: If the model ID is not found in the configuration.
get_embedding_model_config
model_id(str): The ID of the model to get. Should match an ID in theembedding_modelslist.
ModelConfig: The model configuration if found.
ValueError: If the model ID is not found in the configuration.