Skip to main content

Overview

This page tracks the major updates, features, and improvements to PAS2.

Version history

Current version (2024)

Features

  • Paraphrase generation: Automatically generates semantically equivalent variations of user queries
  • Multi-model architecture: Uses Mistral Large for responses and OpenAI’s o3-mini as a judge
  • Real-time progress tracking: Visual feedback during the analysis process
  • Persistent feedback storage: User feedback and results stored in SQLite database
  • Interactive web interface: Clean, responsive Gradio interface with example queries
  • Detailed analysis: Provides confidence scores, reasoning, and specific conflicting facts
  • Statistics dashboard: Real-time tracking of hallucination detection statistics

Core capabilities

  • Generates paraphrased versions of input queries using Mistral API
  • Evaluates responses using model-as-judge approach
  • Provides confidence scores and detailed reasoning
  • Identifies specific conflicting facts across responses
  • Web interface for interactive testing (Gradio)
  • Benchmarking capabilities for bulk evaluation
  • Parallel response generation for improved performance

Technical specifications

  • Response model: Mistral Large (mistral-large-latest)
  • Judge model: OpenAI o3-mini
  • Paraphrase-based detection approach
  • SQLite database for persistent storage

Output formats

  • Benchmark results (CSV, TXT)
  • Feedback logs stored in SQLite database
  • Real-time statistics dashboard

Deployment improvements

Hugging Face Spaces integration

  • Persistent storage using /data directory
  • Feedback data survives Space restarts
  • Statistics preserved long-term
  • No data loss during inactivity periods
  • Environment-based API key configuration

Upcoming features

Check the GitHub repository for planned features and ongoing development: https://github.com/serhanylmz/pas2

Contributing to changelog

When contributing new features, please update this changelog with:
  • Clear description of the change
  • Version number or date
  • Impact on existing functionality
  • Any breaking changes

Build docs developers (and LLMs) love