Skip to main content

Get Started with simpE

This guide will walk you through installing simpE, running your first benchmark, and analyzing the results.
1

Clone the Repository

First, clone the simpE repository from GitHub:
git clone https://github.com/Dariton4000/simpE.git
cd simpE
2

Install Dependencies

simpE uses uv for fast, reliable Python package management. Install all dependencies:
uv sync
This will:
  • Create a virtual environment
  • Install Python 3.14+ if needed
  • Install required packages (openai, questionary)
3

Configure API Endpoint

Before running benchmarks, configure the API endpoint in main.py:
main.py
# Configuration (lines 14-17)
llm = ""  # Leave empty to use currently loaded LM-Studio model
baseurl = "http://127.0.0.1:1234/v1"  # LM-Studio default endpoint
reasoning_effort = "low"  # Options: low, medium, high
Make sure LM-Studio is running with a model loaded before proceeding.
4

Run Your First Benchmark

Execute the benchmark suite:
uv run simpe
You’ll see real-time progress output:
Directory 'logs' created successfully.
Directory 'results' created successfully.
String Reversal 1/100  0.00%
Thinking... 2.34s
String Reversal 1/100  100.00%
Done
String Reversal 2/100  100.00%
Thinking... 1.87s
...
COMPLETE String Reversal: 100/100
Results: 94.00%

Intiger Addition 1/100  0.00%
Thinking... 3.21s
...
COMPLETE Intiger Addition: 100/100
Results: 87.00%

String Rehearsal 1/100  0.00%
Thinking... 4.56s
...
COMPLETE String Rehearsal: 100/100
Results: 76.00%
Each benchmark suite runs 100 tests by default. This can take 30-60 minutes depending on your model’s speed. You can adjust the tries parameter in main.py (line 19) for shorter test runs.
5

Analyze Your Results

After benchmarks complete, analyze the results:
uv run analyze
The analyzer will prompt you to select a results file:
? Which file do you want to get stats about?
> results/result_meta-llama-Llama-3.2-1B-Instruct-IQ2_M_2025-03-03_14-23-45.json
  results/result_qwen2.5-3b-instruct_2025-03-03_12-15-30.json
You’ll receive comprehensive statistics:
string_reversal:
94.0%

add_two_ints:
87.0%

string_rehearsal:
76.0%

string_reversal:
wait found 0.23 times per response
pause found 0.01 times per response
hold on found 0.05 times per response
actually found 1.34 times per response
no, found 0.87 times per response

string_reversal:
Average characters: 245.67
Average word count: 42.3

Median characters: 198.0
Median word count: 35.0

Minimum characters: 87
Minimum word count: 15

Maximum characters: 456
Maximum word count: 78

Average word length: 5.81
Median word length 5.0

Understanding the Output

Console Progress Display

During benchmark execution, simpE shows:
  • Benchmark name - Current test being run (e.g., “String Reversal”)
  • Progress counter - Tests completed vs. total (e.g., “45/100”)
  • Success rate - Real-time accuracy percentage
  • Thinking time - How long the model is reasoning (for reasoning-capable models)

Results Files

Benchmark results are saved in results/ as JSON files with this structure:
{
  "header": {
    "runstarted": "2026-03-03 14:23:45",
    "suggested_thinkinglevel": "low",
    "model_selected": "",
    "max_output_tokens": 512
  },
  "benchmarkresults": {
    "string_reversal": {
      "test_type": "String Reversal",
      "tries": 100,
      "results": [
        {
          "string": "aBc123XyZ",
          "duration_seconds": 2.34,
          "response": "ZyX321cBa",
          "model": "llama-3.2-1b-instruct",
          "status": "success"
        }
      ]
    }
  }
}

Log Files

Detailed logs are stored in logs/ directory:
  • log_YYYY-MM-DD_HH-MM-SS.txt - Timestamped log for each run
  • log_recent.txt - Always contains the most recent run’s logs

Customizing Your Benchmark

Adjust Number of Tests

Edit main.py line 19 to change the number of tests per benchmark:
tries = 100  # Change to 10 for quick tests, 1000 for thorough evaluation

Configure Token Limits

For reasoning models that generate longer outputs, adjust the token limit:
max_tokens = 512 * 1  # Increase multiplier for reasoning models

Set Reasoning Effort

If your model supports reasoning modes:
reasoning_effort = "low"  # Options: "low", "medium", "high"

Common Issues

Make sure LM-Studio is running and has a model loaded. The API endpoint should be accessible at http://127.0.0.1:1234/v1.
Check the logs/log_recent.txt file for error messages. Common causes:
  • API endpoint not configured correctly
  • Model not loaded in LM-Studio
  • Timeout issues with slow models
Install uv using:
curl -LsSf https://astral.sh/uv/install.sh | sh
  • Reduce the tries parameter for faster testing
  • Use a smaller, faster model
  • Check your GPU/CPU usage in LM-Studio

Next Steps

Detailed Installation

Learn about advanced configuration options

Understanding Benchmarks

Deep dive into each benchmark type

Analysis Tools

Get more from your benchmark data

API Reference

Explore the codebase structure

Build docs developers (and LLMs) love