Download Models

This guide covers different ways to download and obtain models for use with ONNX Runtime GenAI.

Download via Foundry Local

Foundry Local provides an easy way to download pre-optimized models for ONNX Runtime GenAI.

Install Foundry Local

Download and install foundry-local for your platform.

List Available Models

View all available models:

foundry model list

Download a Model

Download your chosen model. For example, to download Phi-4:

foundry model download Phi-4-generic-cpu

Locate the Model

Find where the model is saved on disk:

foundry cache location

The model will be in a path like:

C:\Users\<user>\.foundry\Microsoft\Phi-4-generic-cpu\cpu-int4-rtn-block-32-acc-level-4

Foundry Local CLI is not available on Linux at the moment. Please download the model from a Windows or macOS machine and copy it over to your Linux machine if you would like to run on Linux.

Download via Hugging Face Hub

You can download ONNX models directly from Hugging Face using the Hugging Face CLI.

Install Hugging Face CLI

pip install huggingface-hub[cli]

huggingface-cli login

Download a Model

Use the huggingface-cli download command with the model name and subfolder:

huggingface-cli download <model_name> --include <subfolder_name>/* --local-dir .

For example, to download the Phi-4 mini instruct GPU model:

huggingface-cli download microsoft/Phi-4-mini-instruct-onnx --include gpu/* --local-dir .

Identify the Model Path

The model will be downloaded to your specified directory. For example:

gpu/gpu-int4-rtn-block-32

Build Your Own Model

Alternatively, you can build your own ONNX model locally using one of these tools:

Model Builder

Use the ONNX Runtime GenAI Model Builder to convert and optimize PyTorch models

Olive

Use Microsoft Olive for advanced model optimization and conversion

Next Steps

After downloading or building your model, you can:

Follow the Quickstart to run your first inference
Learn about Runtime Options to configure your model
Explore Constrained Decoding for structured outputs

Get Started

Core Concepts

Guides

Multi-Modal

Hardware Acceleration

Download via Foundry Local

Download via Hugging Face Hub

Build Your Own Model

Model Builder

Olive

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Multi-Modal

Hardware Acceleration

​Download via Foundry Local

​Download via Hugging Face Hub

​Build Your Own Model

Model Builder

Olive

​Next Steps

Build docs developers (and LLMs) love

Download via Foundry Local

Download via Hugging Face Hub

Build Your Own Model

Next Steps