LM Studio

LM Studio is a desktop application that lets you run large language models locally with an easy-to-use interface. Page Assist can connect to LM Studio through its OpenAI-compatible API server.

Prerequisites

LM Studio installed on your system
At least one model downloaded in LM Studio
LM Studio’s local server enabled

Installation

If you haven’t installed LM Studio yet:

Download LM Studio

Visit lmstudio.ai and download the installer for your operating system.

Install LM Studio

Run the installer and follow the installation instructions.

Download a Model

Open LM Studio and browse the model library. Download a model suitable for your hardware:

Small (3B-7B): Good for 8-16GB RAM
Medium (13B): Requires 16-32GB RAM
Large (30B+): Requires 32GB+ RAM

Popular choices:

Llama 3.2 (3B or 8B)
Mistral 7B
Phi-3 (3.8B)

Load the Model

In LM Studio, click on a downloaded model to load it into memory.

Enabling the API Server

LM Studio must have its local server running for Page Assist to connect:

Open Local Server Tab

In LM Studio, navigate to the “Local Server” or “Developer” tab (location varies by version).

Start the Server

Click “Start Server”. The default address is:

http://localhost:1234/v1

Verify Server is Running

You should see a confirmation message that the server is running and listening on port 1234.

Connecting to Page Assist

Once LM Studio’s server is running, connect it to Page Assist:

Open Page Assist Settings

Click the Page Assist icon in your browser toolbar, then click the Settings icon.

Navigate to OpenAI Compatible API

Go to the “OpenAI Compatible API” tab.

Add Provider

Click the “Add Provider” button.

Select LM Studio

Choose “LMStudio” from the provider dropdown menu.

Enter API URL

Enter the LM Studio API URL. The default is:

http://localhost:1234/v1

Save Configuration

Click “Save”. Page Assist will automatically fetch available models from LM Studio.

Model Management

Automatic Model Detection

Page Assist automatically detects models that are currently loaded in LM Studio. The model must be loaded in LM Studio before it appears in Page Assist.

Switching Models

To use a different model:

In LM Studio, stop the current model
Load the desired model
Refresh Page Assist or restart the chat
The new model should appear in the model selector

Model Not Appearing

If a model doesn’t appear in Page Assist:

Ensure the model is loaded in LM Studio (not just downloaded)
Verify the API server is running
Check the connection URL in Page Assist settings
Try refreshing the Page Assist interface

Configuration Options

Custom Server Port

To change LM Studio’s server port:

In LM Studio’s server settings, change the port number
Update the URL in Page Assist to match:
```
http://localhost:[NEW_PORT]/v1
```

Server Authentication

LM Studio’s local server typically doesn’t require authentication. If you’ve configured API keys:

In Page Assist provider settings, expand advanced options
Enter your API key in the authentication field
Save the configuration

Custom Headers

For advanced configurations requiring custom headers:

In provider settings, find “Custom Headers” option
Add required headers as key-value pairs
Save your configuration

Troubleshooting

Connection Failed

Verify Server is Running

Check that LM Studio’s local server shows as “Running”.

Check Port Number

Ensure the port in Page Assist matches LM Studio’s server port (default: 1234).

Test URL in Browser

Navigate to http://localhost:1234/v1/models in your browser. You should see a JSON response with model information.

Check Firewall

Ensure your firewall isn’t blocking localhost connections on port 1234.

Restart LM Studio Server

Stop and restart the server in LM Studio.

Model Not Loaded

Error messages like “No model loaded”:

Load a model in LM Studio by clicking on it
Wait for the model to fully load into memory
Verify “Model Loaded” appears in LM Studio
Try sending a message again

Slow Response Times

If responses are slow:

Use a smaller model: Switch to a 3B or 7B model
Adjust GPU layers: In LM Studio, increase GPU layer count if you have a GPU
Check system resources: Close other applications to free up RAM
Use quantized models: GGUF Q4 or Q5 quantizations are faster

Out of Memory Errors

Use a smaller model
Close other applications
Reduce context length in LM Studio settings
Restart LM Studio to clear memory

Performance Optimization

GPU Acceleration

If you have a compatible GPU:

In LM Studio, go to settings
Enable GPU acceleration
Set GPU layer count (higher = more GPU usage)
Reload your model

Context Length

Adjust context window size based on your needs:

Shorter contexts (2048-4096): Faster, less memory
Longer contexts (8192+): Slower, more memory, better for long conversations

Batch Size

In LM Studio settings:

Increase batch size for faster generation (requires more memory)
Decrease if you encounter memory issues

Multiple LM Studio Instances

You can run multiple LM Studio instances on different ports:

Start first instance on default port 1234
Start second instance on a different port (e.g., 1235)
Add each as a separate provider in Page Assist

Comparison with Ollama

Feature	LM Studio	Ollama
UI	Graphical	Command-line
Model Management	Built-in browser	CLI commands
Ease of Use	Very Easy	Moderate
Model Format	GGUF	GGUF
API	OpenAI-compatible	Ollama & OpenAI-compatible
Platform	Windows, Mac, Linux	Windows, Mac, Linux

Best Practices

Keep LM Studio Updated: Regular updates improve performance and compatibility
Monitor Memory: Watch RAM usage and choose models accordingly
One Model at a Time: Load only one model at a time to save resources
Persistent Server: Keep the API server running while using Page Assist
Model Selection: Test different models to find the best balance of quality and speed

Get Started

Core Features

AI Providers

Configuration

Troubleshooting

Resources

Prerequisites

Installation

Enabling the API Server

Connecting to Page Assist

Model Management

Automatic Model Detection

Switching Models

Model Not Appearing

Configuration Options

Custom Server Port

Server Authentication

Custom Headers

Troubleshooting

Connection Failed

Model Not Loaded

Slow Response Times

Out of Memory Errors

Performance Optimization

GPU Acceleration

Context Length

Batch Size

Multiple LM Studio Instances

Comparison with Ollama

Best Practices

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Features

AI Providers

Configuration

Troubleshooting

Resources

​Prerequisites

​Installation

​Enabling the API Server

​Connecting to Page Assist

​Model Management

​Automatic Model Detection

​Switching Models

​Model Not Appearing

​Configuration Options

​Custom Server Port

​Server Authentication

​Custom Headers

​Troubleshooting

​Connection Failed

​Model Not Loaded

​Slow Response Times

​Out of Memory Errors

​Performance Optimization

​GPU Acceleration

​Context Length

​Batch Size

​Multiple LM Studio Instances

​Comparison with Ollama

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Prerequisites

Installation

Enabling the API Server

Connecting to Page Assist

Model Management

Automatic Model Detection

Switching Models

Model Not Appearing

Configuration Options

Custom Server Port

Server Authentication

Custom Headers

Troubleshooting

Connection Failed

Model Not Loaded

Slow Response Times

Out of Memory Errors

Performance Optimization

GPU Acceleration

Context Length

Batch Size

Multiple LM Studio Instances

Comparison with Ollama

Best Practices

Next Steps