Skip to main content
This guide covers different ways to download and obtain models for use with ONNX Runtime GenAI.

Download via Foundry Local

Foundry Local provides an easy way to download pre-optimized models for ONNX Runtime GenAI.
1

Install Foundry Local

Download and install foundry-local for your platform.
2

List Available Models

View all available models:
foundry model list
3

Download a Model

Download your chosen model. For example, to download Phi-4:
foundry model download Phi-4-generic-cpu
4

Locate the Model

Find where the model is saved on disk:
foundry cache location
The model will be in a path like:
C:\Users\<user>\.foundry\Microsoft\Phi-4-generic-cpu\cpu-int4-rtn-block-32-acc-level-4
Foundry Local CLI is not available on Linux at the moment. Please download the model from a Windows or macOS machine and copy it over to your Linux machine if you would like to run on Linux.

Download via Hugging Face Hub

You can download ONNX models directly from Hugging Face using the Hugging Face CLI.
1

Install Hugging Face CLI

pip install huggingface-hub[cli]
2

Login to Hugging Face

huggingface-cli login
3

Download a Model

Use the huggingface-cli download command with the model name and subfolder:
huggingface-cli download <model_name> --include <subfolder_name>/* --local-dir .
For example, to download the Phi-4 mini instruct GPU model:
huggingface-cli download microsoft/Phi-4-mini-instruct-onnx --include gpu/* --local-dir .
4

Identify the Model Path

The model will be downloaded to your specified directory. For example:
gpu/gpu-int4-rtn-block-32

Build Your Own Model

Alternatively, you can build your own ONNX model locally using one of these tools:

Model Builder

Use the ONNX Runtime GenAI Model Builder to convert and optimize PyTorch models

Olive

Use Microsoft Olive for advanced model optimization and conversion

Next Steps

After downloading or building your model, you can:

Build docs developers (and LLMs) love