Skip to main content

Overview

The Databricks provider connects to models hosted on Databricks AI Gateway, including Claude models and Meta Llama models. It supports both token-based and OAuth authentication. Source: crates/goose/src/providers/databricks.rs

Configuration

Environment Variables

DATABRICKS_HOST
string
required
Your Databricks workspace URL (e.g., https://your-workspace.cloud.databricks.com)
DATABRICKS_TOKEN
string
Personal access token (optional if using OAuth)
DATABRICKS_MAX_RETRIES
number
default:"3"
Maximum number of retry attempts
DATABRICKS_INITIAL_RETRY_INTERVAL_MS
number
default:"1000"
Initial retry interval in milliseconds
DATABRICKS_BACKOFF_MULTIPLIER
number
default:"2.0"
Multiplier for exponential backoff
DATABRICKS_MAX_RETRY_INTERVAL_MS
number
default:"60000"
Maximum retry interval in milliseconds

Setup

# Configure using the CLI
goose configure

# Or set environment variables
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi..."

Authentication

The provider supports two authentication methods:

1. Token Authentication

Use a personal access token:
export DATABRICKS_TOKEN="dapi1234567890abcdef"

2. OAuth Authentication

If no token is provided, the provider automatically uses OAuth device code flow:
# Only set the host
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"

# When you run goose, you'll be prompted to authenticate via browser
goose session start
The OAuth flow:
  1. Displays a device code and URL
  2. Opens your browser to authenticate
  3. Caches the OAuth token for future use
  4. Automatically refreshes expired tokens
Default OAuth configuration:
  • Client ID: databricks-cli
  • Redirect URL: http://localhost
  • Scopes: all-apis, offline_access

Supported Models

Claude Models

  • databricks-claude-sonnet-4 (default) - Claude Sonnet on Databricks
  • databricks-claude-sonnet-4-5 - Latest Sonnet
  • databricks-claude-haiku-4-5 (fast model) - Fast Claude model

Meta Llama Models

  • databricks-meta-llama-3-3-70b-instruct - Llama 3.3 70B
  • databricks-meta-llama-3-1-405b-instruct - Llama 3.1 405B
Documentation: https://docs.databricks.com/en/generative-ai/external-models/

Usage

Basic Usage

use goose::providers::create;
use goose::model::ModelConfig;

// Create with default model
let model_config = ModelConfig::new("databricks-claude-sonnet-4")?;
let provider = create("databricks", model_config, vec![]).await?;

// Stream a response
let messages = vec![Message::user().with_text("Hello!")];
let stream = provider.stream(
    &provider.get_model_config(),
    "session-123",
    "You are a helpful assistant.",
    &messages,
    &[],
).await?;

Custom Configuration

let model_config = ModelConfig::new("databricks-meta-llama-3-3-70b-instruct")?
    .with_temperature(0.7)
    .with_max_tokens(2048);

let provider = create("databricks", model_config, vec![]).await?;

Using Fast Models

// Automatically tries databricks-claude-haiku-4-5 with 0 retries
let (response, usage) = provider.complete_fast(
    "session-123",
    "You are a helpful assistant.",
    &messages,
    &[],
).await?;

Advanced Features

Embeddings

The Databricks provider supports text embeddings:
if provider.supports_embeddings() {
    let texts = vec![
        "Hello world".to_string(),
        "Databricks embeddings".to_string(),
    ];
    
    let embeddings = provider.create_embeddings(
        "session-123",
        texts,
    ).await?;
    
    println!("Generated {} embeddings", embeddings.len());
}
Embedding endpoint: serving-endpoints/text-embedding-3-small/invocations

Retry Configuration

Configure retry behavior:
export DATABRICKS_MAX_RETRIES="5"
export DATABRICKS_INITIAL_RETRY_INTERVAL_MS="2000"
export DATABRICKS_BACKOFF_MULTIPLIER="2.5"
export DATABRICKS_MAX_RETRY_INTERVAL_MS="120000"
Fast models use a different retry strategy:
  • Max retries: 0 (fail fast)
  • No exponential backoff

Model Endpoints

The provider automatically routes to the correct endpoint:
// Regular models
GET /serving-endpoints/{model_name}/invocations

// Codex models (if model name contains "codex")
GET /serving-endpoints/responses

// Embeddings
GET /serving-endpoints/text-embedding-3-small/invocations

Implementation Details

Provider Metadata

impl ProviderDef for DatabricksProvider {
    fn metadata() -> ProviderMetadata {
        ProviderMetadata::new(
            "databricks",
            "Databricks",
            "Models on Databricks AI Gateway",
            "databricks-claude-sonnet-4",
            DATABRICKS_KNOWN_MODELS.to_vec(),
            "https://docs.databricks.com/en/generative-ai/external-models/",
            vec![
                ConfigKey::new("DATABRICKS_HOST", true, false, None, true),
                ConfigKey::new("DATABRICKS_TOKEN", false, true, None, true),
            ],
        )
    }
}

Authentication Flow

pub enum DatabricksAuth {
    Token(String),
    OAuth {
        host: String,
        client_id: String,
        redirect_url: String,
        scopes: Vec<String>,
    },
}
The provider dynamically gets the auth token:
impl AuthProvider for DatabricksAuthProvider {
    async fn get_auth_header(&self) -> Result<(String, String)> {
        let token = match &self.auth {
            DatabricksAuth::Token(token) => token.clone(),
            DatabricksAuth::OAuth { host, client_id, redirect_url, scopes } => {
                oauth::get_oauth_token_async(host, client_id, redirect_url, scopes).await?
            }
        };
        Ok(("Authorization".to_string(), format!("Bearer {}", token)))
    }
}

API Format

Requests use OpenAI-compatible format but without the model field:
{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 2048,
  "stream": true
}
Note: The model is specified in the endpoint URL, not the request body.

Retry Strategy

pub struct RetryConfig {
    pub max_retries: usize,
    pub initial_interval_ms: u64,
    pub backoff_multiplier: f64,
    pub max_interval_ms: u64,
}

// Default config
RetryConfig {
    max_retries: 3,
    initial_interval_ms: 1000,
    backoff_multiplier: 2.0,
    max_interval_ms: 60000,
}

// Fast model config (fail fast)
RetryConfig {
    max_retries: 0,
    initial_interval_ms: 0,
    backoff_multiplier: 1.0,
    max_interval_ms: 0,
}

Fetching Available Models

// Get all serving endpoints
let models = provider.fetch_supported_models().await?;

// Queries: GET /api/2.0/serving-endpoints
// Returns endpoint names that can be used as model names
Example response:
{
  "endpoints": [
    {
      "name": "databricks-claude-sonnet-4-5",
      "creator": "[email protected]",
      "creation_timestamp": 1234567890,
      "config": { ... }
    }
  ]
}

Error Handling

match provider.stream(...).await {
    Ok(stream) => { /* handle stream */ },
    Err(ProviderError::Authentication(msg)) => {
        eprintln!("Auth failed: {}", msg);
        eprintln!("Try running: goose configure");
    },
    Err(ProviderError::RateLimited { retry_after }) => {
        eprintln!("Rate limited");
    },
    Err(e) => eprintln!("Error: {}", e),
}

Programmatic Configuration

use goose::providers::databricks::DatabricksProvider;

let provider = DatabricksProvider::from_params(
    "https://your-workspace.cloud.databricks.com".to_string(),
    "dapi1234567890abcdef".to_string(),
    model_config,
)?;

OAuth Token Management

Tokens are cached in the system keyring:
  • Service: databricks_oauth
  • Account: {host}_access_token and {host}_refresh_token
Tokens are automatically refreshed when expired.

See Also

Build docs developers (and LLMs) love