Skip to main content
HuggingFace provides free serverless inference for various open-source models with a monthly credit allocation.

Overview

HuggingFace Inference Providers offer free API access to thousands of open-source models through serverless inference endpoints.

Rate Limits

Monthly Credits: $0.10/month in free credits for serverless inference
View detailed pricing

Model Support

Size Limitation: HuggingFace Serverless Inference is limited to models smaller than 10GB. However, some popular models are supported even if they exceed 10GB.

Available Models

  • Various open-source models across supported providers
  • Text generation models (Llama, Mistral, Gemma, etc.)
  • Text embedding models
  • Image generation models
  • Audio models
  • Computer vision models

Browse Models

Explore thousands of available models on HuggingFace

API Usage

from huggingface_hub import InferenceClient

client = InferenceClient(token="YOUR_HF_TOKEN")

response = client.chat_completion(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
    max_tokens=500
)

print(response.choices[0].message.content)

Getting Started

1

Create Account

Sign up at huggingface.co
2

Generate Access Token

Create a user access token from your settings
3

Choose Model

Browse the model hub and select a model
4

Start Inferencing

Use the API or Python client to run inference

Inference Providers

HuggingFace partners with multiple inference providers:

AWS

Amazon Web Services infrastructure

Azure

Microsoft Azure cloud platform

Google Cloud

Google Cloud Platform

HuggingFace

Native HuggingFace infrastructure

Key Features

  • Access to thousands of open-source models
  • Automatic model loading and scaling
  • No infrastructure management required
  • Pay-as-you-go pricing with free monthly credits
  • Support for various model types (text, image, audio, etc.)

Use Cases

  • Prototyping: Quickly test different models
  • Research: Experiment with latest open-source models
  • Development: Build applications without infrastructure setup
  • Comparison: Test multiple models to find the best fit

Additional Resources

HuggingFace Hub

Explore models and datasets

Documentation

API documentation

Python Client

Python library documentation

Pricing

Detailed pricing information

Build docs developers (and LLMs) love