List dataset types
Retrieve all available dataset types with their configuration schemas.Request
/datasets/types/Authentication
Requires authentication via tenant credentials.Response
Returns an array of dataset type information:Unique name of the dataset type
Description of what the dataset type does
JSON Schema defining required and optional configuration fields
Icon identifier for the dataset type
Whether the dataset type is currently enabled
Get dataset type
Retrieve information about a specific dataset type.Request
/datasets/types/{name}Path parameters
The name of the dataset type (e.g.,
local_file, remote_weaviate)Response
Returns a single dataset type object with the same structure as the list endpoint.Get configuration schema
Retrieve only the configuration schema for a dataset type.Request
/datasets/types/{name}/schemaPath parameters
The name of the dataset type
Response
Returns the JSON Schema object for the dataset type’s configuration.Available types
local_file
Local filesystem dataset with ChromaDB for vector search. This type:- Automatically watches specified directories for new files
- Ingests supported file types (PDF, TXT, DOCX, HTML, MD, CSV, JSON, XLSX)
- Uses ChromaDB running in a local container
- Supports semantic search using all-MiniLM-L6-v2 embeddings
- Shares a single ChromaDB provisioner across multiple datasets
remote_weaviate
Connect to an existing remote Weaviate instance. This type:- Connects to a Weaviate server you manage
- Does not provision infrastructure
- Supports custom filters and metadata extraction
- Allows flexible content property mapping
- Supports third-party embedding API keys via headers