Overview
Thededuper command removes duplicate ROM entries from a child DAT file that already exist in a parent DAT file. This is useful for creating subset DATs that only contain unique entries not found in a main collection.
Basic Usage
- Reads the input (child) DAT file
- Reads the parent DAT file
- Compares ROM entries based on hash values (CRC, MD5, SHA1)
- Removes matching entries from the child DAT
- Saves the deduplicated DAT file
Required Arguments
Input DAT File
Input DAT file to deduplicate. Can be either a database reference (
seed:name) or a file path.Parent DAT File
You must specify either--parent or --auto-merge (mutually exclusive).
Parent DAT file to compare against. Can be either a database reference (
seed:name) or a file path.When the input is a file path (
.dat or .xml), the --parent argument is required unless using --auto-merge.Auto-Merge Mode
Automatically detect and use the parent DAT from the database
- Datoso looks up the input DAT in the database
- Reads the
parentfield from the DAT metadata - Uses that parent DAT for deduplication
Optional Arguments
Output File
Output file path. If not specified, overwrites the input file.
Dry Run
Show what would be removed without writing the output file
- No files are written
- Duplicate entries are identified and logged
- Enables debug-level logging automatically
- Useful for previewing changes before committing
Input Format Options
Database Reference Format
Reference DAT files already imported into Datoso:redump:Sony - PlayStation 2nointro:Nintendo - Nintendo DStosec:Commodore 64
File Path Format
Reference DAT files directly from the filesystem:/home/user/roms/ps2.dat./local/datfile.dat/mnt/storage/collection.xml
Deduplication Logic
Hash Comparison
The deduper compares ROM entries using hash values in this priority order:- SHA1 (most reliable, if present)
- MD5 (fallback)
- CRC32 (fallback)
Matching Criteria
A ROM is considered a duplicate if:- At least one hash value matches between input and parent
- The hash comparison is successful for the strongest available hash
Preservation
The deduper preserves:- DAT metadata (header, description, version)
- Game/ROM structure
- Non-duplicate entries
Use Cases
Creating Subset DATs
Remove main collection ROMs from a demo/beta collection:Avoiding Duplicate Storage
Before building a ROM set, deduplicate child DATs:Auto-Merge Workflow
For DATs with parent relationships configured:Batch Deduplication
Deduplicate multiple DATs in a script:Workflow Examples
Complete Deduplication Workflow
Using with File Paths
Integration with Seed Processing
Output
Normal Mode
Dry Run Mode
Error Handling
Parent Required Error
Error:--auto-merge:
File Not Found
If input or parent DAT cannot be found:- Verify the database reference:
datoso dat --all --only-names - Check file paths exist and are readable
- Ensure proper permissions on files
Invalid DAT Format
If the DAT file cannot be parsed:- Verify it’s a valid ClrMamePro or XML format
- Check for corruption
- Try importing first:
datoso import
Performance Considerations
Large DAT Files
For DAT files with thousands of entries:- Processing may take several minutes
- Memory usage scales with DAT size
- Use
--dry-runfirst to estimate time
Hash Comparison Speed
Hash comparison is generally fast, but:- SHA1 comparison is more reliable than CRC32
- Multiple hash types increase comparison accuracy
- First match found is used (optimization)
Best Practices
Always Backup
Before deduplicating important DATs:Use Dry Run First
Preview changes before committing:Configure Parent Relationships
For frequently used deduplication:Integrate with Processing
Enable automatic deduplication during seed processing:Troubleshooting
No Duplicates Found
If deduplication results in no changes:- Verify the parent DAT contains expected entries
- Check that hash values exist in both DATs
- Ensure correct parent DAT is specified
- Use
--dry-runwith-vfor details
All Entries Removed
If all entries are removed (empty output):- Verify you specified the correct parent
- Check that input and parent aren’t reversed
- Review with
--dry-runfirst
Auto-Merge Fails
If--auto-merge doesn’t work:
- Verify the input DAT is in database:
datoso dat --dat-name "seed:name" - Check parent field is set:
datoso dat --dat-name "seed:name" --fields parent - Set parent if missing:
datoso dat --dat-name "seed:name" --set "parent=seed:parent"
Next Steps
- Learn about DAT commands for managing parent relationships
- Use seed commands for processing with auto-deduplication
- Configure auto-merge with config commands
- Import DAT files with import commands