All output tables include embeddings written directly to your configured vector store for efficient downstream retrieval.
Shared fields
All tables have two identifier fields for global uniqueness and human readability:| Field | Type | Description |
|---|---|---|
id | str | Generated UUID, ensuring global uniqueness across all records |
human_readable_id | int | Incremented short ID created per-run. Used in generated summaries with citations for easy visual cross-reference |
Communities
This table contains the final communities generated by the Leiden algorithm. Communities are strictly hierarchical, subdividing into children as cluster affinity is narrowed.| Field | Type | Description |
|---|---|---|
community | int | Leiden-generated cluster ID for the community. These increment with depth and are unique through all levels of the hierarchy. For this table, human_readable_id is a copy of the community ID |
parent | int | Parent community ID |
children | int[] | List of child community IDs |
level | int | Depth of the community in the hierarchy |
title | str | Friendly name of the community |
entity_ids | str[] | List of entities that are members of the community |
relationship_ids | str[] | List of relationships wholly within the community (source and target both in community) |
text_unit_ids | str[] | List of text units represented within the community |
period | str | Date of ingest in ISO8601 format, used for incremental update merges |
size | int | Size of the community (entity count), used for incremental update merges |
Example communities.parquet
Example communities.parquet
Community reports
This table contains the summarized reports for each community, generated by the LLM.| Field | Type | Description |
|---|---|---|
community | int | Short ID of the community this report applies to |
parent | int | Parent community ID |
children | int[] | List of child community IDs |
level | int | Level of the community this report applies to |
title | str | LLM-generated title for the report |
summary | str | LLM-generated summary of the report |
full_content | str | LLM-generated full report |
rank | float | LLM-derived relevance ranking based on member entity salience |
rating_explanation | str | LLM-derived explanation of the rank |
findings | dict | LLM-derived list of the top 5-10 insights from the community. Contains summary and explanation values |
full_content_json | json | Full JSON output as returned by the LLM. Most fields are extracted into columns, but this JSON is sent for query summarization to allow prompt tuning to add fields/content |
period | str | Date of ingest in ISO8601 format, used for incremental update merges |
size | int | Size of the community (entity count), used for incremental update merges |
Example community_reports.parquet
Example community_reports.parquet
Covariates
This optional table is generated when claim extraction is enabled. Claims typically identify malicious behavior such as fraud, so they are not useful for all datasets.| Field | Type | Description |
|---|---|---|
covariate_type | str | Always “claim” with default covariates |
type | str | Nature of the claim type |
description | str | LLM-generated description of the behavior |
subject_id | str | Name of the source entity (performing the claimed behavior) |
object_id | str | Name of the target entity (behavior is performed on) |
status | str | LLM-derived assessment of correctness. One of: TRUE, FALSE, SUSPECTED |
start_date | str | LLM-derived start of the claimed activity (ISO8601) |
end_date | str | LLM-derived end of the claimed activity (ISO8601) |
source_text | str | Short string of text containing the claimed behavior |
text_unit_id | str | ID of the text unit the claim was extracted from |
Example covariates.parquet
Example covariates.parquet
Documents
This table contains the list of document content after import.| Field | Type | Description |
|---|---|---|
title | str | Filename, unless otherwise configured during CSV/JSON import |
text | str | Full text of the document |
text_unit_ids | str[] | List of text units (chunks) that were parsed from the document |
metadata | dict | If specified during CSV/JSON import, this is a dict of metadata for the document |
Example documents.parquet
Example documents.parquet
Entities
This table contains all entities found in the data by the LLM.| Field | Type | Description |
|---|---|---|
title | str | Name of the entity |
type | str | Type of the entity. By default: “organization”, “person”, “geo”, or “event” (unless configured differently or auto-tuning is used) |
description | str | Textual description of the entity. Since entities may be found in many text units, this is an LLM-derived summary of all descriptions |
text_unit_ids | str[] | List of the text units containing the entity |
frequency | int | Count of text units the entity was found within |
degree | int | Node degree (connectedness) in the graph |
Example entities.parquet
Example entities.parquet
Relationships
This table contains all entity-to-entity relationships found in the data by the LLM. This is also the edge list for the graph.| Field | Type | Description |
|---|---|---|
source | str | Name of the source entity |
target | str | Name of the target entity |
description | str | LLM-derived description of the relationship. Like entity descriptions, this is summarized from multiple instances |
weight | float | Weight of the edge in the graph. Summed from an LLM-derived “strength” measure for each relationship instance |
combined_degree | int | Sum of source and target node degrees |
text_unit_ids | str[] | List of text units the relationship was found within |
Example relationships.parquet
Example relationships.parquet
Text units
This table contains all text chunks parsed from the input documents.| Field | Type | Description |
|---|---|---|
text | str | Raw full text of the chunk |
n_tokens | int | Number of tokens in the chunk. Should normally match the chunk_size config parameter, except for the last chunk which is often shorter |
document_id | str | ID of the document the chunk came from |
entity_ids | str[] | List of entities found in the text unit |
relationship_ids | str[] | List of relationships found in the text unit |
covariate_ids | str[] | Optional list of covariates found in the text unit |
Example text_units.parquet
Example text_units.parquet
Working with Parquet files
Storage locations
By default, Parquet files are written to theoutput directory specified in your configuration:
settings.yaml
- Local filesystem
- Azure Blob Storage
- Custom storage
output/entities.parquetoutput/relationships.parquetoutput/communities.parquet- etc.
Next steps
Custom graphs
Learn how to bring your own existing graph data
Querying
Use the output tables for GraphRAG queries
Configuration
Configure storage providers and output settings