Skip to main content
The Datasets API provides endpoints for creating, managing, and exporting datasets. Datasets are collections of examples used for testing, evaluation, and experimentation.

Endpoints

List Datasets

GET /v1/datasets
Retrieve a paginated list of datasets.

Query Parameters

cursor
string
Cursor for pagination (base64-encoded dataset ID)
name
string
Optional dataset name to filter by
limit
integer
default:"10"
Maximum number of datasets to return (must be greater than 0)

Response

data
array
Array of dataset objects
next_cursor
string
Cursor for the next page (null if no more results)

Example

import requests

url = "http://localhost:6006/v1/datasets"
headers = {"Authorization": "Bearer your-api-key"}
params = {"limit": 10}

response = requests.get(url, headers=headers, params=params)
data = response.json()

for dataset in data["data"]:
    print(f"{dataset['name']}: {dataset['example_count']} examples")

# Pagination
if data["next_cursor"]:
    next_response = requests.get(
        url,
        headers=headers,
        params={"cursor": data["next_cursor"], "limit": 10}
    )

Get Dataset

GET /v1/datasets/{id}
Retrieve a specific dataset by ID.

Path Parameters

id
string
required
Global ID of the dataset

Response

data
object
Dataset object with all fields including example_count

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="  # Global ID
url = f"http://localhost:6006/v1/datasets/{dataset_id}"
headers = {"Authorization": "Bearer your-api-key"}

response = requests.get(url, headers=headers)
dataset = response.json()["data"]

print(f"Dataset: {dataset['name']}")
print(f"Examples: {dataset['example_count']}")
print(f"Created: {dataset['created_at']}")

Upload Dataset

POST /v1/datasets/upload
Create a new dataset or append to an existing dataset from JSON, JSONL, CSV, or PyArrow.

Query Parameters

sync
boolean
default:"false"
If true, process synchronously and return dataset ID. If false, queue for async processing.

Request Body

The request format depends on the Content-Type:
name
string
required
Dataset name
action
string
default:"create"
"create" or "append"
description
string
Dataset description
inputs
array
required
Array of input objects (any JSON structure)
outputs
array
Array of output objects (same length as inputs)
metadata
array
Array of metadata objects (same length as inputs)
splits
array
Array of split assignments per example:
  • String: Single split name
  • Array of strings: Multiple splits
  • null: No splits
span_ids
array
Array of span IDs to link examples back to traces (string or null per example)
name
string
required
Dataset name
action
string
default:"create"
"create" or "append"
description
string
Dataset description
input_keys[]
array
required
Column names for input fields
output_keys[]
array
required
Column names for output fields
metadata_keys[]
array
Column names for metadata fields
split_keys[]
array
Column names containing split assignments
span_id_key
string
Column name containing span IDs
file
file
required
File to upload (CSV, JSONL, or PyArrow format)

Response (when sync=true)

data
object

Example

import requests

url = "http://localhost:6006/v1/datasets/upload"
headers = {
    "Authorization": "Bearer your-api-key",
    "Content-Type": "application/json"
}

data = {
    "name": "qa-dataset",
    "description": "Question answering examples",
    "action": "create",
    "inputs": [
        {"question": "What is Phoenix?"},
        {"question": "How do I trace LLMs?"}
    ],
    "outputs": [
        {"answer": "Phoenix is an observability platform"},
        {"answer": "Use OpenTelemetry instrumentation"}
    ],
    "metadata": [
        {"source": "docs"},
        {"source": "faq"}
    ],
    "splits": [
        "train",
        ["train", "validation"]
    ]
}

response = requests.post(
    url,
    json=data,
    headers=headers,
    params={"sync": True}
)

result = response.json()["data"]
print(f"Dataset ID: {result['dataset_id']}")
print(f"Version ID: {result['version_id']}")

Get Dataset Examples

GET /v1/datasets/{id}/examples
Retrieve examples from a dataset.

Path Parameters

id
string
required
Global ID of the dataset

Query Parameters

version_id
string
ID of the dataset version (defaults to latest version)
split
array
List of split identifiers (Global IDs or names) to filter by

Response

data
object

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="
url = f"http://localhost:6006/v1/datasets/{dataset_id}/examples"
headers = {"Authorization": "Bearer your-api-key"}

# Get all examples
response = requests.get(url, headers=headers)
examples = response.json()["data"]["examples"]

for example in examples:
    print(f"Input: {example['input']}")
    print(f"Output: {example['output']}")

# Filter by split
response = requests.get(
    url,
    headers=headers,
    params={"split": ["train", "validation"]}
)

Delete Dataset

DELETE /v1/datasets/{id}
Delete a dataset and all its associated data.
This operation is permanent and will delete all examples, versions, and experiments associated with the dataset.

Path Parameters

id
string
required
Global ID of the dataset

Response

Returns HTTP 204 (No Content) on success.

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="
url = f"http://localhost:6006/v1/datasets/{dataset_id}"
headers = {"Authorization": "Bearer your-api-key"}

response = requests.delete(url, headers=headers)
print(response.status_code)  # 204

Export Dataset

Phoenix provides multiple export endpoints for datasets:

Export as CSV

GET /v1/datasets/{id}/csv
Download dataset examples as a CSV file.

Export as OpenAI Fine-tuning JSONL

GET /v1/datasets/{id}/jsonl/openai_ft
Export in OpenAI’s fine-tuning format with messages and tools fields.

Export as OpenAI Evals JSONL

GET /v1/datasets/{id}/jsonl/openai_evals
Export in OpenAI’s evals format with messages and ideal fields.

Query Parameters (all export endpoints)

version_id
string
ID of the dataset version (defaults to latest)

Example

import requests

dataset_id = "RGF0YXNldDoxMjM="

# Export as CSV
response = requests.get(
    f"http://localhost:6006/v1/datasets/{dataset_id}/csv",
    headers={"Authorization": "Bearer your-api-key"}
)

with open("dataset.csv", "wb") as f:
    f.write(response.content)

# Export for OpenAI fine-tuning
response = requests.get(
    f"http://localhost:6006/v1/datasets/{dataset_id}/jsonl/openai_ft",
    headers={"Authorization": "Bearer your-api-key"}
)

with open("dataset.jsonl", "wb") as f:
    f.write(response.content)

Dataset Versions

List Dataset Versions

GET /v1/datasets/{id}/versions
List all versions of a dataset.

Path Parameters

id
string
required
Global ID of the dataset

Query Parameters

cursor
string
Cursor for pagination
limit
integer
default:"10"
Maximum number of versions to return

Response

data
array
Array of dataset version objects
next_cursor
string
Pagination cursor

Error Handling

404
error
Dataset, version, or examples not found
409
error
Dataset with the same name already exists (on create)
422
error
Invalid request:
  • Invalid dataset ID format
  • Missing required fields (name, inputs)
  • Invalid file format
  • Mismatched array lengths
429
error
Too many requests (async queue full)

Best Practices

Version Your Data

Datasets are automatically versioned - each upload creates a new version

Use Splits

Assign examples to splits (train/test/validation) for organized experimentation

Link to Traces

Use span_ids to connect dataset examples back to production traces

Async for Large Datasets

Use sync=false for large dataset uploads to avoid timeouts

Using Phoenix SDK

For easier dataset management, use the Phoenix Python SDK:
from phoenix.experiments import upload_dataset

# Upload from pandas DataFrame
dataset = upload_dataset(
    dataset_name="qa-dataset",
    dataframe=df,
    input_keys=["question"],
    output_keys=["answer"]
)

print(f"Dataset ID: {dataset.id}")
print(f"Examples: {len(dataset)}")
See the Datasets & Experiments documentation for complete SDK usage.

Build docs developers (and LLMs) love