Quickstart

Overview

This guide will walk you through making your first API call to a free LLM provider. We’ll use OpenRouter as our example since it offers 20+ free models with a simple OpenAI-compatible API.

Choose a Provider

For this quickstart, we’ll use OpenRouter which offers:

20+ free models including Llama 3.3 70B and Gemma 3
OpenAI-compatible API (easy migration)
20 requests/minute, 50 requests/day (1000/day with $10 lifetime top-up)

Other great options for getting started: Groq (fastest inference), Cerebras (high token limits), or GitHub Models (40+ models).

Get Your API Key

Visit OpenRouter.ai
Sign up for a free account
Navigate to API Keys
Create a new API key and copy it

Keep your API key secure! Never commit it to version control or share it publicly.

Install Dependencies

Install the OpenAI SDK (works with OpenRouter’s compatible endpoint):

pip install openai

Make Your First API Call

from openai import OpenAI

# Initialize client with OpenRouter
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_API_KEY",
)

# Make a chat completion request
response = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct:free",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
)

print(response.choices[0].message.content)

Available Free Models on OpenRouter

OpenRouter offers 20+ free models. Here are some popular options:

Llama 3.3 70B

General-purpose powerhouse - great for reasoning and complex tasksmeta-llama/llama-3.3-70b-instruct:free

Gemma 3 27B

Google’s efficient model - balanced performance and speedgoogle/gemma-3-27b-it:free

Mistral Small 24B

Fast and capable for most tasksmistralai/mistral-small-3.1-24b-instruct:free

Qwen 3 Coder

Specialized for code generationqwen/qwen3-coder:free

Try Other Providers

All providers with OpenAI-compatible APIs work similarly - just change the base_url and model name:

Groq (Ultra-Fast Inference)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key="YOUR_GROQ_API_KEY",
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Write a Python function to calculate fibonacci"}],
)

print(response.choices[0].message.content)

Groq Limits: 14,400 requests/day for Llama 3.1 8B, 1,000 requests/day for Llama 3.3 70B

Cerebras (High Token Limits)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cerebras.ai/v1",
    api_key="YOUR_CEREBRAS_API_KEY",
)

response = client.chat.completions.create(
    model="llama3.3-70b",
    messages=[{"role": "user", "content": "What are the benefits of edge computing?"}],
)

print(response.choices[0].message.content)

Cerebras Limits: 14,400 requests/day, 1,000,000 tokens/day - generous for most projects!

Advanced: Streaming Responses

For real-time applications like chatbots, use streaming to display responses as they’re generated:

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_API_KEY",
)

stream = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct:free",
    messages=[{"role": "user", "content": "Write a short story about a robot"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Environment Variables Setup

Never hardcode API keys! Use environment variables:

OPENROUTER_API_KEY=sk-or-v1-...
GROQ_API_KEY=gsk_...
CEREBRAS_API_KEY=...

Rate Limit Management

Most free providers have rate limits. Handle them gracefully:

Python

from openai import OpenAI, RateLimitError
import time

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ.get("OPENROUTER_API_KEY"),
)

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="meta-llama/llama-3.3-70b-instruct:free",
                messages=messages,
            )
            return response
        except RateLimitError:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                raise

response = make_request_with_retry(
    [{"role": "user", "content": "Hello!"}]
)

Next Steps

Free Providers

Browse all 13 always-free providers

Trial Credits

Explore providers offering trial credits

Best Practices

Learn tips for optimal API usage

Choosing a Provider

Find the best provider for your needs

Pro Tip: Start with multiple providers and implement fallback logic. If one hits rate limits, automatically switch to another!

Get Started

Guides

Overview

Available Free Models on OpenRouter

Llama 3.3 70B

Gemma 3 27B

Mistral Small 24B

Qwen 3 Coder

Try Other Providers

Groq (Ultra-Fast Inference)

Cerebras (High Token Limits)

Advanced: Streaming Responses

Environment Variables Setup

Rate Limit Management

Next Steps

Free Providers

Trial Credits

Best Practices

Choosing a Provider

Build docs developers (and LLMs) love

Get Started

Guides

​Overview

​Available Free Models on OpenRouter

Llama 3.3 70B

Gemma 3 27B

Mistral Small 24B

Qwen 3 Coder

​Try Other Providers

​Groq (Ultra-Fast Inference)

​Cerebras (High Token Limits)

​Advanced: Streaming Responses

​Environment Variables Setup

​Rate Limit Management

​Next Steps

Free Providers

Trial Credits

Best Practices

Choosing a Provider

Build docs developers (and LLMs) love

Overview

Available Free Models on OpenRouter

Try Other Providers

Groq (Ultra-Fast Inference)

Cerebras (High Token Limits)

Advanced: Streaming Responses

Environment Variables Setup

Rate Limit Management

Next Steps