Datasets API - Phoenix

The Datasets API provides endpoints for creating, managing, and exporting datasets. Datasets are collections of examples used for testing, evaluation, and experimentation.

Endpoints

List Datasets

GET /v1/datasets

Retrieve a paginated list of datasets.

Query Parameters

cursor

string

Cursor for pagination (base64-encoded dataset ID)

name

string

Optional dataset name to filter by

limit

integer

default:"10"

Maximum number of datasets to return (must be greater than 0)

Response

data

array

Array of dataset objects

Show Dataset Object

string

Global ID of the dataset

name

string

Name of the dataset

description

string

Dataset description

metadata

object

Custom metadata key-value pairs

created_at

string

ISO 8601 timestamp of creation

updated_at

string

ISO 8601 timestamp of last update

example_count

integer

Number of examples in the dataset

next_cursor

string

Cursor for the next page (null if no more results)

Example

import requests

url = "http://localhost:6006/v1/datasets"
headers = {"Authorization": "Bearer your-api-key"}
params = {"limit": 10}

response = requests.get(url, headers=headers, params=params)
data = response.json()

for dataset in data["data"]:
    print(f"{dataset['name']}: {dataset['example_count']} examples")

# Pagination
if data["next_cursor"]:
    next_response = requests.get(
        url,
        headers=headers,
        params={"cursor": data["next_cursor"], "limit": 10}
    )

Get Dataset

GET /v1/datasets/{id}

Retrieve a specific dataset by ID.

Path Parameters

string

required

Global ID of the dataset

Response

data

object

Dataset object with all fields including example_count

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="  # Global ID
url = f"http://localhost:6006/v1/datasets/{dataset_id}"
headers = {"Authorization": "Bearer your-api-key"}

response = requests.get(url, headers=headers)
dataset = response.json()["data"]

print(f"Dataset: {dataset['name']}")
print(f"Examples: {dataset['example_count']}")
print(f"Created: {dataset['created_at']}")

Upload Dataset

POST /v1/datasets/upload

Create a new dataset or append to an existing dataset from JSON, JSONL, CSV, or PyArrow.

Query Parameters

sync

boolean

default:"false"

If true, process synchronously and return dataset ID. If false, queue for async processing.

Request Body

The request format depends on the Content-Type:

JSON Format (application/json)

name

string

required

Dataset name

action

string

default:"create"

"create" or "append"

description

string

Dataset description

inputs

array

required

Array of input objects (any JSON structure)

outputs

array

Array of output objects (same length as inputs)

metadata

array

Array of metadata objects (same length as inputs)

splits

array

Array of split assignments per example:

String: Single split name
Array of strings: Multiple splits
null: No splits

span_ids

array

Array of span IDs to link examples back to traces (string or null per example)

File Upload (multipart/form-data)

name

string

required

Dataset name

action

string

default:"create"

"create" or "append"

description

string

Dataset description

input_keys[]

array

required

Column names for input fields

output_keys[]

array

required

Column names for output fields

metadata_keys[]

array

Column names for metadata fields

split_keys[]

array

Column names containing split assignments

span_id_key

string

Column name containing span IDs

file

required

File to upload (CSV, JSONL, or PyArrow format)

Response (when sync=true)

data

object

Show properties

dataset_id

string

Global ID of the created/updated dataset

version_id

string

Global ID of the dataset version

Example

import requests

url = "http://localhost:6006/v1/datasets/upload"
headers = {
    "Authorization": "Bearer your-api-key",
    "Content-Type": "application/json"
}

data = {
    "name": "qa-dataset",
    "description": "Question answering examples",
    "action": "create",
    "inputs": [
        {"question": "What is Phoenix?"},
        {"question": "How do I trace LLMs?"}
    ],
    "outputs": [
        {"answer": "Phoenix is an observability platform"},
        {"answer": "Use OpenTelemetry instrumentation"}
    ],
    "metadata": [
        {"source": "docs"},
        {"source": "faq"}
    ],
    "splits": [
        "train",
        ["train", "validation"]
    ]
}

response = requests.post(
    url,
    json=data,
    headers=headers,
    params={"sync": True}
)

result = response.json()["data"]
print(f"Dataset ID: {result['dataset_id']}")
print(f"Version ID: {result['version_id']}")

Get Dataset Examples

GET /v1/datasets/{id}/examples

Retrieve examples from a dataset.

Path Parameters

string

required

Global ID of the dataset

Query Parameters

version_id

string

ID of the dataset version (defaults to latest version)

split

array

List of split identifiers (Global IDs or names) to filter by

Response

data

object

Show properties

dataset_id

string

Global ID of the dataset

version_id

string

Global ID of the dataset version

filtered_splits

array

Names of splits that were filtered (if split parameter was used)

examples

array

Array of example objects

Show Example Object

string

Global ID of the example

input

object

Input data as key-value pairs

output

object

Output data as key-value pairs

metadata

object

Metadata as key-value pairs

updated_at

string

ISO 8601 timestamp of last update

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="
url = f"http://localhost:6006/v1/datasets/{dataset_id}/examples"
headers = {"Authorization": "Bearer your-api-key"}

# Get all examples
response = requests.get(url, headers=headers)
examples = response.json()["data"]["examples"]

for example in examples:
    print(f"Input: {example['input']}")
    print(f"Output: {example['output']}")

# Filter by split
response = requests.get(
    url,
    headers=headers,
    params={"split": ["train", "validation"]}
)

Delete Dataset

DELETE /v1/datasets/{id}

Delete a dataset and all its associated data.

This operation is permanent and will delete all examples, versions, and experiments associated with the dataset.

Path Parameters

string

required

Global ID of the dataset

Response

Returns HTTP 204 (No Content) on success.

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="
url = f"http://localhost:6006/v1/datasets/{dataset_id}"
headers = {"Authorization": "Bearer your-api-key"}

response = requests.delete(url, headers=headers)
print(response.status_code)  # 204

Export Dataset

Phoenix provides multiple export endpoints for datasets:

Export as CSV

GET /v1/datasets/{id}/csv

Download dataset examples as a CSV file.

Export as OpenAI Fine-tuning JSONL

GET /v1/datasets/{id}/jsonl/openai_ft

Export in OpenAI’s fine-tuning format with messages and tools fields.

Export as OpenAI Evals JSONL

GET /v1/datasets/{id}/jsonl/openai_evals

Export in OpenAI’s evals format with messages and ideal fields.

Query Parameters (all export endpoints)

version_id

string

ID of the dataset version (defaults to latest)

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="

# Export as CSV
response = requests.get(
    f"http://localhost:6006/v1/datasets/{dataset_id}/csv",
    headers={"Authorization": "Bearer your-api-key"}
)

with open("dataset.csv", "wb") as f:
    f.write(response.content)

# Export for OpenAI fine-tuning
response = requests.get(
    f"http://localhost:6006/v1/datasets/{dataset_id}/jsonl/openai_ft",
    headers={"Authorization": "Bearer your-api-key"}
)

with open("dataset.jsonl", "wb") as f:
    f.write(response.content)

Dataset Versions

List Dataset Versions

GET /v1/datasets/{id}/versions

List all versions of a dataset.

Path Parameters

string

required

Global ID of the dataset

Query Parameters

cursor

string

Cursor for pagination

limit

integer

default:"10"

Maximum number of versions to return

Response

data

array

Array of dataset version objects

Show Version Object

version_id

string

Global ID of the version

description

string

Version description

metadata

object

Version metadata

created_at

string

ISO 8601 timestamp

next_cursor

string

Pagination cursor

Error Handling

404

error

Dataset, version, or examples not found

409

error

Dataset with the same name already exists (on create)

422

error

Invalid request:

Invalid dataset ID format
Missing required fields (name, inputs)
Invalid file format
Mismatched array lengths

429

error

Too many requests (async queue full)

Best Practices

Version Your Data

Datasets are automatically versioned - each upload creates a new version

Use Splits

Assign examples to splits (train/test/validation) for organized experimentation

Link to Traces

Use span_ids to connect dataset examples back to production traces

Async for Large Datasets

Use sync=false for large dataset uploads to avoid timeouts

Using Phoenix SDK

For easier dataset management, use the Phoenix Python SDK:

from phoenix.experiments import upload_dataset

# Upload from pandas DataFrame
dataset = upload_dataset(
    dataset_name="qa-dataset",
    dataframe=df,
    input_keys=["question"],
    output_keys=["answer"]
)

print(f"Dataset ID: {dataset.id}")
print(f"Examples: {len(dataset)}")

See the Datasets & Experiments documentation for complete SDK usage.

REST API

​Endpoints

​List Datasets

​Query Parameters

​Response

​Example

​Get Dataset

​Path Parameters

​Response

​Example

​Upload Dataset

​Query Parameters

​Request Body

​Response (when sync=true)

​Example

​Get Dataset Examples

​Path Parameters

​Query Parameters

​Response

​Example

​Delete Dataset

​Path Parameters

​Response

​Example

​Export Dataset

​Export as CSV

​Export as OpenAI Fine-tuning JSONL

​Export as OpenAI Evals JSONL

​Query Parameters (all export endpoints)

​Example

​Dataset Versions

​List Dataset Versions

​Path Parameters

​Query Parameters

​Response

​Error Handling

​Best Practices

Version Your Data

Use Splits

Link to Traces

Async for Large Datasets

​Using Phoenix SDK

Build docs developers (and LLMs) love

Endpoints

List Datasets

Query Parameters

Response

Example

Get Dataset

Path Parameters

Response

Example

Upload Dataset

Query Parameters

Request Body

Response (when sync=true)

Example

Get Dataset Examples

Path Parameters

Query Parameters

Response

Example

Delete Dataset

Path Parameters

Response

Example

Export Dataset

Export as CSV

Export as OpenAI Fine-tuning JSONL

Export as OpenAI Evals JSONL

Query Parameters (all export endpoints)

Example

Dataset Versions

List Dataset Versions

Path Parameters

Query Parameters

Response

Error Handling

Best Practices

Using Phoenix SDK