Understanding Deduplication
Deduplication in Datoso works by:- Comparing ROM entries between two DAT files (child and parent)
- Removing entries from the child that exist in the parent
- Preserving unique entries in the child
- Optionally saving the deduplicated result
When to Use Deduplication
Common scenarios:- Removing redundant entries when a system has both main and subset DATs
- Cleaning up clone/parent relationships
- Merging specialized DAT variants
- Removing duplicates before using ROMs with emulators
Basic Deduplication Workflow
Identify Parent and Child DATs
Determine which DAT should be the parent (reference) and which should be the child (to be deduplicated).For example:
- Parent:
Sony - PlayStation.dat(main set) - Child:
Sony - PlayStation (Japan).dat(subset)
Run Deduplication Command
Use the This removes all entries from the Japan DAT that already exist in the main PlayStation DAT.
deduper command with input and parent DAT files:Deduplication Methods
Using File Paths
Deduplicate using physical DAT files:If you don’t specify
-o (output), the input file will be overwritten with the deduplicated version.Using Database References
Deduplicate using DATs stored in Datoso’s database:seed:datname format to reference DATs that have been previously processed and stored in the database.
Auto-Merge (Internal Deduplication)
Remove duplicates within a single DAT file:Automated Deduplication During Processing
Enable automatic deduplication during the processing workflow:Setting Up Parent Relationships
Define parent-child relationships for automatic deduplication:ParentMergeEnabled=True, the child DAT will automatically be deduplicated against its parent.
Advanced Options
Dry Run
Preview deduplication without saving changes:Auto-Merge Mode
For DATs with theautomerge property set:
PROCESS.AutoMergeEnabled=True, DATs with the automerge property will be internally deduplicated during processing.
Understanding Deduplication Output
Status Messages
When deduplication occurs during processing:Deduped status indicates duplicates were removed via parent merge.
Automerged status indicates internal duplicates were removed.
Logging
Enable verbose output to see details:Practical Examples
Example 1: Deduplicate Regional Variant
Example 2: Automated Workflow
Example 3: Batch Deduplication
Example 4: Clean Import from ROMVault
Deduplication in Processing Pipeline
When processing with deduplication enabled, the action pipeline includes:Troubleshooting
No Duplicates Found
If deduplication reports 0 ROMs removed:Deduplication Errors
Common errors and solutions: “Parent dat is required when input is a dat file”- You must specify both
-inputand-pparameters when using file paths - Or use
-inputalone with--auto-mergeflag
- DAT file may be corrupted or in unsupported format
- Verify the file can be opened with a DAT manager
- Ensure the DAT has been processed:
datoso seed --process - Check DAT name:
datoso dat -d seed:name
Performance Issues
For very large DAT files:Next Steps
After deduplication:- Process the deduplicated DATs if not already done
- Manage DAT properties to configure behavior
- Use deduplicated DATs with ROM managers like ROMVault or CLRMamePro
Deduplication modifies DAT files, not ROM files. After deduplication, use ROM management tools to rebuild your actual ROM collection based on the updated DATs.
Best Practices
- Always backup before deduplicating with
-oto save to a new file first - Test with dry-run to preview changes before committing
- Use parent relationships in the database for consistent automatic deduplication
- Process systematically from main sets to regional variants to specialized collections
- Enable logging with
-vfor complex deduplication workflows