What is simpE?
simpE is a lightweight benchmarking tool designed to evaluate small language models (LLMs) on fundamental cognitive tasks. Whether you’re testing models locally with LM-Studio or evaluating reasoning capabilities, simpE provides quick, reliable metrics on model performance.Quick Start
Get up and running with simpE in minutes
Installation Guide
Detailed setup instructions and configuration
Benchmark Types
Learn about the three core benchmark areas
Analyzing Results
Understand your benchmark data
Benchmark Types
simpE evaluates language models across three fundamental capability areas:1. String Reversal
Evaluates basic pattern manipulation by asking the model to reverse strings of varying lengths (2-30 characters). This tests:- Character-level attention
- Sequential processing
- Ability to follow simple transformations
2. Big Integer Addition
Challenges the model with arithmetic operations on large integers (2-30 digits each). This benchmark reveals:- Mathematical reasoning capabilities
- Handling of large numbers
- Ability to perform calculations without explanation
3. String Rehearsal
Tests the model’s ability to reproduce longer strings (10-500 characters) exactly as provided. This measures:- Context retention
- Exact replication capabilities
- Attention to detail
Key Features
Real-time Progress Tracking
simpE provides live console output showing:- Current benchmark progress (e.g., “String Reversal 45/100”)
- Success rate percentage updated in real-time
- Thinking time for reasoning models
- Completion status for each benchmark
Reasoning Model Support
Built-in support for models with reasoning capabilities:- Configurable reasoning effort levels (low, medium, high)
- Automatic capture of reasoning traces
- Detailed reasoning statistics in analysis
Comprehensive Logging
All benchmark runs generate detailed logs:- Timestamped execution logs in
logs/directory - JSON results with full response data in
results/directory - Recent log file for quick access to latest run
Flexible Configuration
Easily adjust benchmark parameters inmain.py:
Analyzing Results
After running benchmarks, use the built-in analysis tool:- Accuracy metrics - Success percentage for each benchmark
- Reasoning pattern analysis - Frequency of key phrases like “wait”, “actually”, “hold on”
- Statistical insights - Average/median/min/max for reasoning trace lengths and word counts
Results are saved as JSON files in the
results/ directory with timestamps and model information for easy comparison across runs.Why simpE?
Simple Setup
No complex configuration - just install and run
Local-First
Works with LM-Studio for complete privacy
Fast Iteration
Quick benchmarks help you iterate on model selection
Detailed Insights
Rich logging and analysis for deep dives
Next Steps
Install simpE
Follow the installation guide to set up simpE and configure your API endpoint
Run Your First Benchmark
Check out the quick start guide to run your first benchmark suite