Quick Start Guide

Get Started with simpE

This guide will walk you through installing simpE, running your first benchmark, and analyzing the results.

Clone the Repository

First, clone the simpE repository from GitHub:

git clone https://github.com/Dariton4000/simpE.git
cd simpE

Install Dependencies

simpE uses uv for fast, reliable Python package management. Install all dependencies:

uv sync

This will:

Create a virtual environment
Install Python 3.14+ if needed
Install required packages (openai, questionary)

Configure API Endpoint

Before running benchmarks, configure the API endpoint in main.py:

main.py

# Configuration (lines 14-17)
llm = ""  # Leave empty to use currently loaded LM-Studio model
baseurl = "http://127.0.0.1:1234/v1"  # LM-Studio default endpoint
reasoning_effort = "low"  # Options: low, medium, high

Make sure LM-Studio is running with a model loaded before proceeding.

Run Your First Benchmark

Execute the benchmark suite:

uv run simpe

You’ll see real-time progress output:

Directory 'logs' created successfully.
Directory 'results' created successfully.
String Reversal 1/100  0.00%
Thinking... 2.34s
String Reversal 1/100  100.00%
Done
String Reversal 2/100  100.00%
Thinking... 1.87s
...
COMPLETE String Reversal: 100/100
Results: 94.00%

Intiger Addition 1/100  0.00%
Thinking... 3.21s
...
COMPLETE Intiger Addition: 100/100
Results: 87.00%

String Rehearsal 1/100  0.00%
Thinking... 4.56s
...
COMPLETE String Rehearsal: 100/100
Results: 76.00%

Each benchmark suite runs 100 tests by default. This can take 30-60 minutes depending on your model’s speed. You can adjust the tries parameter in main.py (line 19) for shorter test runs.

Analyze Your Results

After benchmarks complete, analyze the results:

uv run analyze

The analyzer will prompt you to select a results file:

? Which file do you want to get stats about?
> results/result_meta-llama-Llama-3.2-1B-Instruct-IQ2_M_2025-03-03_14-23-45.json
  results/result_qwen2.5-3b-instruct_2025-03-03_12-15-30.json

You’ll receive comprehensive statistics:

string_reversal:
94.0%

add_two_ints:
87.0%

string_rehearsal:
76.0%

string_reversal:
wait found 0.23 times per response
pause found 0.01 times per response
hold on found 0.05 times per response
actually found 1.34 times per response
no, found 0.87 times per response

string_reversal:
Average characters: 245.67
Average word count: 42.3

Median characters: 198.0
Median word count: 35.0

Minimum characters: 87
Minimum word count: 15

Maximum characters: 456
Maximum word count: 78

Average word length: 5.81
Median word length 5.0

Understanding the Output

Console Progress Display

During benchmark execution, simpE shows:

Benchmark name - Current test being run (e.g., “String Reversal”)
Progress counter - Tests completed vs. total (e.g., “45/100”)
Success rate - Real-time accuracy percentage
Thinking time - How long the model is reasoning (for reasoning-capable models)

Results Files

Benchmark results are saved in results/ as JSON files with this structure:

{
  "header": {
    "runstarted": "2026-03-03 14:23:45",
    "suggested_thinkinglevel": "low",
    "model_selected": "",
    "max_output_tokens": 512
  },
  "benchmarkresults": {
    "string_reversal": {
      "test_type": "String Reversal",
      "tries": 100,
      "results": [
        {
          "string": "aBc123XyZ",
          "duration_seconds": 2.34,
          "response": "ZyX321cBa",
          "model": "llama-3.2-1b-instruct",
          "status": "success"
        }
      ]
    }
  }
}

Log Files

Detailed logs are stored in logs/ directory:

log_YYYY-MM-DD_HH-MM-SS.txt - Timestamped log for each run
log_recent.txt - Always contains the most recent run’s logs

Customizing Your Benchmark

Adjust Number of Tests

Edit main.py line 19 to change the number of tests per benchmark:

tries = 100  # Change to 10 for quick tests, 1000 for thorough evaluation

Configure Token Limits

For reasoning models that generate longer outputs, adjust the token limit:

max_tokens = 512 * 1  # Increase multiplier for reasoning models

Set Reasoning Effort

If your model supports reasoning modes:

reasoning_effort = "low"  # Options: "low", "medium", "high"

Common Issues

Connection refused error

Make sure LM-Studio is running and has a model loaded. The API endpoint should be accessible at http://127.0.0.1:1234/v1.

No results generated

Check the logs/log_recent.txt file for error messages. Common causes:

API endpoint not configured correctly
Model not loaded in LM-Studio
Timeout issues with slow models

'uv' command not found

Install uv using:

curl -LsSf https://astral.sh/uv/install.sh | sh

Benchmark runs too slowly

Reduce the tries parameter for faster testing
Use a smaller, faster model
Check your GPU/CPU usage in LM-Studio

Next Steps

Detailed Installation

Learn about advanced configuration options

Understanding Benchmarks

Deep dive into each benchmark type

Analysis Tools

Get more from your benchmark data

API Reference

Explore the codebase structure

Get Started

Benchmarks

Usage

API Reference

Quick Start Guide

Get Started with simpE

Understanding the Output

Console Progress Display

Results Files

Log Files

Customizing Your Benchmark

Adjust Number of Tests

Configure Token Limits

Set Reasoning Effort

Common Issues

Next Steps

Detailed Installation

Understanding Benchmarks

Analysis Tools

API Reference

Build docs developers (and LLMs) love

Get Started

Benchmarks

Usage

API Reference

​Get Started with simpE

​Understanding the Output

​Console Progress Display

​Results Files

​Log Files

​Customizing Your Benchmark

​Adjust Number of Tests

​Configure Token Limits

​Set Reasoning Effort

​Common Issues

​Next Steps

Detailed Installation

Understanding Benchmarks

Analysis Tools

API Reference

Build docs developers (and LLMs) love

Get Started with simpE

Understanding the Output

Console Progress Display

Results Files

Log Files

Customizing Your Benchmark

Adjust Number of Tests

Configure Token Limits

Set Reasoning Effort

Common Issues

Next Steps