Skip to main content
This guide will help you set up the Ollama API Proxy and test it with a real request. You’ll be able to use commercial LLMs like OpenAI, Google Gemini, and OpenRouter with tools that support the Ollama API format.

Prerequisites

Before you begin, ensure you have:

Quick Start with npx/bunx

The fastest way to get started is using npx (Node.js) or bunx (Bun) - no installation required!
1

Create a configuration directory

Create a dedicated directory for your proxy configuration:
mkdir ollama-proxy-config
cd ollama-proxy-config
2

Set up environment variables

Create a .env file with your API keys. You only need to include the providers you want to use:
OPENAI_API_KEY=sk-proj-...
GEMINI_API_KEY=AIzaSy...
OPENROUTER_API_KEY=sk-or-v1-...
At least one API key is required. The proxy will exit with an error if no API keys are found.
Optional configuration:
PORT=11434  # Default port (same as Ollama)
OPENROUTER_API_URL=https://openrouter.ai/api/v1  # Custom OpenRouter URL
NODE_ENV=production  # Set to 'production' to disable timestamps in logs
3

Start the proxy server

Run the proxy using npx or bunx:
npx ollama-api-proxy
You should see output similar to:
✅ Loaded models from models.json
🚀 Ollama Proxy with Streaming running on http://localhost:11434
🔑 Providers: openai, google, openrouter
📋 Available models: gpt-4o-mini, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gemini-2.5-flash, gemini-2.5-flash-lite, deepseek-r1
4

Verify the server is running

Open a new terminal and check the server status:
curl http://localhost:11434/api/version
Expected response:
{"version":"1.0.1e"}
List available models:
curl http://localhost:11434/api/tags
This will return all available models based on your configured API keys.
5

Test with a sample request

Send a test chat completion request:
curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Say hello!"
      }
    ],
    "stream": false
  }'
Try different models like gemini-2.5-flash or deepseek-r1 if you have the corresponding API keys configured!
For streaming responses, set "stream": true:
curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Write a haiku about coding"
      }
    ],
    "stream": true
  }'

Configure JetBrains AI Assistant

Now that your proxy is running, you can connect JetBrains AI Assistant to use commercial LLMs:
1

Open JetBrains AI Assistant settings

In your JetBrains IDE:
  1. Go to Settings/PreferencesToolsAI Assistant
  2. Select Ollama as your provider
2

Configure the Ollama URL

Set the Ollama server URL to:
http://localhost:11434
3

Select your model

Choose one of the available models from the dropdown:
  • gpt-4o-mini - Fast and efficient OpenAI model (recommended for most tasks)
  • gpt-4o - More capable OpenAI model
  • gemini-2.5-flash - Google’s fast model with large context
  • deepseek-r1 - Free reasoning model via OpenRouter
The gpt-4o-mini model offers the best balance of speed and quality for most coding tasks.
4

Start using AI Assistant

You can now use JetBrains AI Assistant with your chosen commercial LLM! Try:
  • Code completion and suggestions
  • Explaining code
  • Generating tests
  • Refactoring assistance

Supported API Endpoints

The proxy implements the following Ollama-compatible endpoints:

/api/chat

Chat completion with message history support

/api/generate

Single-turn text generation from a prompt

/api/tags

List all available models

/api/version

Get proxy version information

Next Steps

API Reference

Explore all available API endpoints and parameters

Model Configuration

Customize available models with models.json

Environment Variables

Learn about all configuration options

Installation Methods

Explore alternative installation methods (npm, Docker, etc.)

Troubleshooting

Make sure you have at least one API key set in your .env file:
OPENAI_API_KEY=your_key_here
# OR
GEMINI_API_KEY=your_key_here
# OR
OPENROUTER_API_KEY=your_key_here
The proxy requires at least one valid API key to start.
The requested model doesn’t exist in your configuration. Check available models:
curl http://localhost:11434/api/tags
Make sure you have the required API key for the model’s provider.
The model’s provider doesn’t have a valid API key configured. For example:
  • gpt-4o-mini requires OPENAI_API_KEY
  • gemini-2.5-flash requires GEMINI_API_KEY
  • deepseek-r1 requires OPENROUTER_API_KEY
If port 11434 is already in use (e.g., by Ollama itself), change the port:
.env
PORT=11435
Then update your client configuration to use http://localhost:11435.
  1. Verify the proxy is running: curl http://localhost:11434/api/version
  2. Check that no firewall is blocking the connection
  3. Ensure you’re using the correct URL format: http://localhost:11434 (not https)

Build docs developers (and LLMs) love