What is Datoso?
Datoso (DAT Organizer and SOrter) is a Python command-line tool designed to download, organize, and manage ROM DAT files from various sources. It transforms the way you organize your ROM collections by merging DAT files from different sources into a unified folder structure optimized for emulators rather than traditional DAT organization.Core Architecture
Datoso follows a modular architecture with clear separation of concerns:Component Overview
Key Components
Commands Layer
Commands Layer
The commands layer provides the user-facing CLI interface:
- config: Manage configuration settings
- doctor: Validate seed installations and requirements
- dat: Query and modify DAT file properties in the database
- seed: Execute seed-specific operations (fetch, process)
- import: Import existing DAT files from RomVault or other sources
- deduper: Remove duplicate ROMs between parent and child DATs
src/datoso/commands/Seeds (Plugins)
Seeds (Plugins)
Seeds are the plugin system that defines how to fetch and process DATs from different sources. Each seed implements:
- Fetch Module: Logic to download DATs from the source
- Rules: System detection and folder organization rules
- Actions: Processing pipeline configuration
datoso-seed-redump.Located in: Core at src/datoso/seeds/, individual seeds as separate packagesProcessing Engine
Processing Engine
The processor executes a pipeline of actions on each DAT file:
- LoadDatFile: Parse DAT file (XML, ClrMamePro, or DOSCenter format)
- DeleteOld: Remove outdated versions based on date comparison
- Copy: Copy DAT to the organized folder structure
- Deduplicate: Remove ROMs that exist in parent DATs
- AutoMerge: Merge DATs that share the same target
- SaveToDatabase: Persist metadata to the database
- MarkMias: Flag Missing In Action ROMs
src/datoso/actions/processor.pyDatabase Layer
Database Layer
Uses TinyDB (JSON-based) to store:
- DAT Metadata: Name, seed, version, date, system, file paths
- System Definitions: Platform mappings, company names, folder rules
- MIA Records: Known missing or unavailable ROMs
- Seed Configuration: Per-seed settings and overrides
src/datoso/database/DAT File Parsers
DAT File Parsers
Datoso supports multiple DAT file formats:
- XMLDatFile: Standard XML format (most common)
- ClrMameProDatFile: ClrMamePro text format
- DOSCenterDatFile: DOS Center variant format
src/datoso/repositories/dat_file.pyThe Fetch → Process Workflow
Datoso operates in two distinct phases that work together to organize your ROM collection.Phase 1: Fetch
The fetch phase downloads DAT files from their source repositories:- Seed’s fetch module connects to the source (e.g., Redump website)
- Downloads available DAT files to a temporary directory
- Organizes downloads by seed name:
~/.datoso/tmp/{seed}/dats/ - Handles authentication, rate limiting, and download management
- No processing or organization occurs yet
The fetch phase is completely isolated from processing. You can fetch from multiple seeds before processing any of them.
Phase 2: Process
The process phase organizes, deduplicates, and structures your DAT files:-
LoadDatFile: Parse the DAT file format and extract metadata
- Detects format (XML, ClrMamePro, etc.)
- Extracts name, description, date, version
- Reads game and ROM entries
-
DeleteOld: Check database for existing versions
- Compares dates to determine if newer
- Removes old physical files if updating
- Skips if existing version is newer
-
System Detection & Path Generation:
- Uses seed rules to identify system/platform
- Determines company (Nintendo, Sony, etc.)
- Applies modifiers (e.g., “Source Code”, “Translated”)
- Generates folder path:
{Company}/{System}/{Modifier}/
-
Copy: Move DAT to organized structure
- Creates destination folders as needed
- Preserves DAT file format and name
- Updates file path in database
-
Deduplicate (if parent DAT defined):
- Compares ROM hashes against parent DAT
- Removes duplicate ROMs already in parent
- Keeps unique ROMs in child DAT
-
SaveToDatabase: Persist all metadata
- Stores DAT properties in TinyDB
- Enables querying and management via CLI
- Tracks version history
-
MarkMias (optional):
- Flags ROMs known to be unavailable
- Uses community-maintained MIA lists
- Adds
mia="yes"attribute to ROM entries
Configuration System
Datoso uses INI-style configuration with multiple layers:Configuration Hierarchy
- Default Config: Built into Datoso (
src/datoso/datoso.ini) - Global Config:
~/.config/datoso/datoso.config - Local Config:
.datosorcin current directory - Command-line Flags: Override all other settings
Key Configuration Sections
Database Schema
The TinyDB database stores four main types of records:DAT Records
System Records
Define how systems are organized and named:Action Pipeline Customization
Seeds define their processing pipeline through action configurations. Here’s an example fromsrc/datoso/seeds/:
Actions can be customized per seed or even per system using configuration overrides.
Error Handling & Recovery
Datoso includes several mechanisms for handling errors:Doctor Command
- Required Python packages are installed
- Seed modules can be loaded
- Configuration is valid
- Database is accessible
- Network connectivity for fetches
Logging
Enable detailed logging for troubleshooting:~/.datoso/datoso.log when logging is enabled via config.
Performance Considerations
Parallel Fetching
Fetch multiple seeds simultaneously by running separate commands in parallel
Filter Processing
Use
--filter to process only specific DATs matching a patternIncremental Updates
Only downloads and processes changed DATs based on date comparison
Database Flushing
Explicit flush operations ensure data consistency without performance overhead
Next Steps
Now that you understand how Datoso works, explore:- Seeds - Deep dive into the plugin system
- DATs and ROMs - Understanding DAT file formats and organization
- Configuration Reference - Full configuration options