Skip to main content
GraphRAG provides sensible defaults for all configuration options. This reference documents all default values.

Model defaults

Completion models

DEFAULT_COMPLETION_MODEL_ID
str
default:"default_completion_model"
Default identifier for completion model configurations.
DEFAULT_COMPLETION_MODEL
str
default:"gpt-4.1"
Default completion model name.
DEFAULT_COMPLETION_MODEL_AUTH_TYPE
AuthMethod
default:"ApiKey"
Default authentication method for completion models.
DEFAULT_MODEL_PROVIDER
str
default:"openai"
Default model provider.

Embedding models

DEFAULT_EMBEDDING_MODEL_ID
str
default:"default_embedding_model"
Default identifier for embedding model configurations.
DEFAULT_EMBEDDING_MODEL
str
default:"text-embedding-3-large"
Default embedding model name.
DEFAULT_EMBEDDING_MODEL_AUTH_TYPE
AuthMethod
default:"ApiKey"
Default authentication method for embedding models.

Encoding

ENCODING_MODEL
str
default:"o200k_base"
Default encoding model for tokenization.

Directory defaults

DEFAULT_INPUT_BASE_DIR
str
default:"input"
Default base directory for input files.
DEFAULT_OUTPUT_BASE_DIR
str
default:"output"
Default base directory for output files.
DEFAULT_CACHE_BASE_DIR
str
default:"cache"
Default base directory for cache storage.
DEFAULT_UPDATE_OUTPUT_BASE_DIR
str
default:"update_output"
Default base directory for incremental update output.

Entity types

DEFAULT_ENTITY_TYPES
list[str]
Default entity types to extract during graph construction.

Configuration class defaults

The following sections document default values for each configuration class.

BasicSearchDefaults

prompt
None
default:"None"
Basic search prompt template.
k
int
default:"10"
Number of results to return.
max_context_tokens
int
default:"12000"
Maximum context tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.

ChunkingDefaults

type
str
default:"tokens"
Chunking strategy type (from ChunkerType enum).
size
int
default:"1200"
Chunk size in tokens.
overlap
int
default:"100"
Overlap between chunks in tokens.
encoding_model
str
default:"o200k_base"
Encoding model for tokenization.
prepend_metadata
None
default:"None"
Metadata to prepend to chunks.

ClusterGraphDefaults

max_cluster_size
int
default:"10"
Maximum size of clusters.
use_lcc
bool
default:"True"
Whether to use the largest connected component.
seed
int
default:"0xDEADBEEF"
Random seed for clustering (3735928559 in decimal).

CommunityReportDefaults

graph_prompt
None
default:"None"
Prompt for graph-based community reports.
text_prompt
None
default:"None"
Prompt for text-based community reports.
max_length
int
default:"2000"
Maximum report length in tokens.
max_input_length
int
default:"8000"
Maximum input length in tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"community_reporting"
Model instance name for caching.

DriftSearchDefaults

prompt
None
default:"None"
DRIFT search prompt.
reduce_prompt
None
default:"None"
Reduce step prompt.
data_max_tokens
int
default:"12000"
Maximum data tokens.
reduce_max_tokens
None
default:"None"
Maximum reduce tokens.
reduce_temperature
float
default:"0"
Temperature for reduce step.
reduce_max_completion_tokens
None
default:"None"
Maximum completion tokens for reduce.
concurrency
int
default:"32"
Concurrency level for DRIFT operations.
drift_k_followups
int
default:"20"
Number of followup queries.
primer_folds
int
default:"5"
Number of primer folds.
primer_llm_max_tokens
int
default:"12000"
Maximum tokens for primer LLM.
n_depth
int
default:"3"
Search depth.
local_search_text_unit_prop
float
default:"0.9"
Text unit proportion for local search component.
local_search_community_prop
float
default:"0.1"
Community proportion for local search component.
local_search_top_k_mapped_entities
int
default:"10"
Top k entities for local search.
local_search_top_k_relationships
int
default:"10"
Top k relationships for local search.
local_search_max_data_tokens
int
default:"12000"
Maximum data tokens for local search.
local_search_temperature
float
default:"0"
Temperature for local search.
local_search_top_p
float
default:"1"
Top p for local search.
local_search_n
int
default:"1"
Number of completions for local search.
local_search_llm_max_gen_tokens
int | None
default:"None"
Maximum generation tokens for local search.
local_search_llm_max_gen_completion_tokens
int | None
default:"None"
Maximum completion tokens for local search.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.

EmbedTextDefaults

embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.
model_instance_name
str
default:"text_embedding"
Model instance name for caching.
batch_size
int
default:"16"
Batch size for embedding operations.
batch_max_tokens
int
default:"8191"
Maximum tokens per batch.
names
list[str]
List of embeddings to generate (uses default_embeddings).

ExtractClaimsDefaults

enabled
bool
default:"False"
Whether claim extraction is enabled.
prompt
None
default:"None"
Claim extraction prompt.
description
str
Description of claims to extract.
max_gleanings
int
default:"1"
Maximum number of gleaning iterations.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"extract_claims"
Model instance name for caching.

ExtractGraphDefaults

prompt
None
default:"None"
Graph extraction prompt.
entity_types
list[str]
Entity types to extract.
max_gleanings
int
default:"1"
Maximum number of gleaning iterations.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"extract_graph"
Model instance name for caching.

TextAnalyzerDefaults

Used for NLP-based graph extraction.
extractor_type
NounPhraseExtractorType
default:"RegexEnglish"
Noun phrase extractor type.
model_name
str
default:"en_core_web_md"
SpaCy model name.
max_word_length
int
default:"15"
Maximum word length to consider.
word_delimiter
str
default:"' '"
Delimiter between words.
include_named_entities
bool
default:"True"
Whether to include named entities.
exclude_nouns
list[str]
List of nouns to exclude (uses EN_STOP_WORDS).
exclude_entity_tags
list[str]
default:"[\"DATE\"]"
Entity tags to exclude.
exclude_pos_tags
list[str]
default:"[\"DET\", \"PRON\", \"INTJ\", \"X\"]"
Part-of-speech tags to exclude.
noun_phrase_tags
list[str]
default:"[\"PROPN\", \"NOUNS\"]"
Tags for noun phrases.
noun_phrase_grammars
dict[str, str]
Grammar rules for noun phrase combination.Default:
{
    "PROPN,PROPN": "PROPN",
    "NOUN,NOUN": "NOUNS",
    "NOUNS,NOUN": "NOUNS",
    "ADJ,ADJ": "ADJ",
    "ADJ,NOUN": "NOUNS"
}

ExtractGraphNLPDefaults

normalize_edge_weights
bool
default:"True"
Whether to normalize edge weights.
text_analyzer
TextAnalyzerDefaults
Text analyzer configuration.
concurrent_requests
int
default:"25"
Number of concurrent requests.
async_mode
AsyncType
default:"Threaded"
Async mode to use.

GlobalSearchDefaults

map_prompt
None
default:"None"
Map step prompt.
reduce_prompt
None
default:"None"
Reduce step prompt.
knowledge_prompt
None
default:"None"
Knowledge generation prompt.
max_context_tokens
int
default:"12000"
Maximum context tokens.
data_max_tokens
int
default:"12000"
Maximum data tokens.
map_max_length
int
default:"1000"
Maximum map response length in words.
reduce_max_length
int
default:"2000"
Maximum reduce response length in words.
dynamic_search_threshold
int
default:"1"
Community rating threshold for inclusion.
dynamic_search_keep_parent
bool
default:"False"
Keep parent community if children are relevant.
dynamic_search_num_repeats
int
default:"1"
Number of times to rate each community.
dynamic_search_use_summary
bool
default:"False"
Use community summary instead of full context.
dynamic_search_max_level
int
default:"2"
Maximum community hierarchy level.
completion_model_id
str
default:"default_completion_model"
Completion model ID.

StorageDefaults

type
str
default:"file"
Storage type (from StorageType enum).
encoding
str | None
default:"None"
Text encoding.
base_dir
str | None
default:"None"
Base directory for file storage.
azure_connection_string
None
default:"None"
Azure connection string.
azure_container_name
None
default:"None"
Azure container name.
azure_account_url
None
default:"None"
Azure account URL.
azure_cosmosdb_account_url
None
default:"None"
Azure CosmosDB account URL.

InputDefaults

type
InputType
default:"Text"
Input type.
encoding
str | None
default:"None"
Text encoding.
file_pattern
None
default:"None"
File pattern for matching input files.
id_column
None
default:"None"
Column name for document IDs.
title_column
None
default:"None"
Column name for document titles.
text_column
None
default:"None"
Column name for document text.

InputStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"input"
Base directory for input storage.

CacheStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"cache"
Base directory for cache storage.

CacheDefaults

type
CacheType
default:"Json"
Cache type.
storage
CacheStorageDefaults
Cache storage configuration.

LocalSearchDefaults

prompt
None
default:"None"
Local search prompt.
text_unit_prop
float
default:"0.5"
Text unit proportion.
community_prop
float
default:"0.15"
Community proportion.
conversation_history_max_turns
int
default:"5"
Maximum conversation history turns.
top_k_entities
int
default:"10"
Top k entities to retrieve.
top_k_relationships
int
default:"10"
Top k relationships to retrieve.
max_context_tokens
int
default:"12000"
Maximum context tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
embedding_model_id
str
default:"default_embedding_model"
Embedding model ID.

OutputStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"output"
Base directory for output storage.

PruneGraphDefaults

min_node_freq
int
default:"2"
Minimum node frequency.
max_node_freq_std
None
default:"None"
Maximum node frequency standard deviation.
min_node_degree
int
default:"1"
Minimum node degree.
max_node_degree_std
None
default:"None"
Maximum node degree standard deviation.
min_edge_weight_pct
float
default:"40.0"
Minimum edge weight percentage.
remove_ego_nodes
bool
default:"True"
Whether to remove ego nodes.
lcc_only
bool
default:"False"
Keep only largest connected component.

ReportingDefaults

type
ReportingType
default:"file"
Reporting type.
base_dir
str
default:"logs"
Base directory for reporting.
connection_string
None
default:"None"
Connection string for blob reporting.
container_name
None
default:"None"
Container name for blob reporting.
storage_account_blob_url
None
default:"None"
Storage account blob URL.

SnapshotsDefaults

embeddings
bool
default:"False"
Whether to save embedding snapshots.
graphml
bool
default:"False"
Whether to save GraphML snapshots.
raw_graph
bool
default:"False"
Whether to save raw graph snapshots.

SummarizeDescriptionsDefaults

prompt
None
default:"None"
Summarization prompt.
max_length
int
default:"500"
Maximum summary length in tokens.
max_input_tokens
int
default:"4000"
Maximum input tokens.
completion_model_id
str
default:"default_completion_model"
Completion model ID.
model_instance_name
str
default:"summarize_descriptions"
Model instance name for caching.

UpdateOutputStorageDefaults

Extends StorageDefaults.
base_dir
str
default:"update_output"
Base directory for update output storage.

VectorStoreDefaults

type
str
default:"lancedb"
Vector store type (from VectorStoreType enum).
db_uri
str
default:"output/lancedb"
Database URI for vector store.

GraphRagConfigDefaults

Root configuration defaults.
models
dict
default:"{}"
Legacy model configurations.
completion_models
dict
default:"{}"
Completion model configurations.
embedding_models
dict
default:"{}"
Embedding model configurations.
concurrent_requests
int
default:"25"
Default concurrent requests.
async_mode
AsyncType
default:"Threaded"
Default async mode.
reporting
ReportingDefaults
Reporting configuration defaults.
input_storage
InputStorageDefaults
Input storage configuration defaults.
output_storage
OutputStorageDefaults
Output storage configuration defaults.
update_output_storage
UpdateOutputStorageDefaults
Update output storage configuration defaults.
cache
CacheDefaults
Cache configuration defaults.
input
InputDefaults
Input configuration defaults.
embed_text
EmbedTextDefaults
Text embedding configuration defaults.
chunking
ChunkingDefaults
Chunking configuration defaults.
snapshots
SnapshotsDefaults
Snapshots configuration defaults.
extract_graph
ExtractGraphDefaults
Graph extraction configuration defaults.
extract_graph_nlp
ExtractGraphNLPDefaults
NLP graph extraction configuration defaults.
summarize_descriptions
SummarizeDescriptionsDefaults
Description summarization configuration defaults.
community_reports
CommunityReportDefaults
Community reports configuration defaults.
extract_claims
ExtractClaimsDefaults
Claims extraction configuration defaults.
prune_graph
PruneGraphDefaults
Graph pruning configuration defaults.
cluster_graph
ClusterGraphDefaults
Graph clustering configuration defaults.
Local search configuration defaults.
Global search configuration defaults.
DRIFT search configuration defaults.
Basic search configuration defaults.
vector_store
VectorStoreDefaults
Vector store configuration defaults.
workflows
None
default:"None"
Workflows list.

Build docs developers (and LLMs) love