DoFi
Domain Filtering - A collection of Python and Bash scripts for validating, filtering, and checking domain lists.Overview
DoFi provides two main tools:Domain Filter
Python script that validates TLDs, removes overlapping domains, and filters invalid entries
Domain Check
Bash script that verifies domain existence using the host command
Download Project
Requirements
- Python: 3.12.3 or later
- Bash: 5.2.21 or later
- Tested on: Ubuntu 22.04/24.04 x64
Domain Filter (Python)
Advanced domain filtering with TLD validation and overlap removal.Features
The Python script performs comprehensive domain filtering:Overlap Removal
Removes overlapping domains (e.g., keeps
example.com and removes sub.example.com if both exist)Basic Usage
Output Files
- Default Output
- Custom Output
By default, the script creates:
output.txt- Validated and filtered domainsremoved.txt- Domains that were filtered out
TLD Coverage
The filter validates against comprehensive TLD sources:- ccTLDs - Country code top-level domains
- gTLDs - Generic top-level domains
- sTLDs - Sponsored top-level domains
- eTLDs - Effective top-level domains
- 4LDs - Four-level domains
tlds.txt during processing.
Example
Domain Check (Bash)
Verifies domain existence using DNS lookups.Features
The Bash script checks domain validity:- Uses
hostcommand for DNS verification - Parallel processing for speed
- Separates valid and invalid domains
- Generates difference report
Basic Usage
Output Files
The script generates three output files:hit.txt
Existing domains verified via DNS
fault.txt
Non-existent domains that failed verification
outdiff.txt
Difference between input and output
Parallel Processing
Control the number of parallel checks:Adjust the parallel process count based on your system resources and network capacity. Higher values = faster processing but more resource usage.
Example
Workflow Examples
Complete Domain Cleaning
Combine both tools for comprehensive cleaning:Large List Processing
TLD Data Sources
DoFi pulls TLD data from authoritative sources:IANA
Official IANA TLD list
Public Suffix
Mozilla’s public suffix list
WHOIS XML API
Supported gTLD database
Blackweb TLDs
Extended TLD appendix
Performance Tips
Python Script Optimization
Python Script Optimization
For large domain lists (millions of entries):
- Run on systems with adequate RAM (4GB+ recommended)
- Use SSD storage for faster I/O
- TLD cache is created once and reused
Bash Script Optimization
Bash Script Optimization
For efficient domain checking:
- Increase parallel processes on powerful systems
- Use reliable DNS servers
- Consider network bandwidth limits
- Monitor system load during processing
Combined Workflow
Combined Workflow
Optimize the full pipeline:This approach minimizes network checks by filtering invalid entries first.
Use Cases
Blocklist Maintenance
Clean and validate domain blocklists before deployment
SEO Analysis
Verify domain lists for SEO tools and analysis
Security Research
Validate threat intelligence domain feeds
DNS Administration
Maintain clean DNS zone files and records
Troubleshooting
TLD Download Failures
TLD Download Failures
If TLD sources are unavailable:
- Check internet connectivity
- Verify firewall/proxy settings
- Script will use cached TLDs if available
Domain Check Timeouts
Domain Check Timeouts
If domain checks are slow or timing out:
- Reduce parallel processes
- Check DNS server responsiveness
- Consider using local DNS cache
Memory Issues
Memory Issues
For very large lists:
- Split input files into smaller chunks
- Process in batches
- Increase system swap if needed
Best Practices
Prepare Input
Clean input files before processing:
- Remove
http://,https://,www.prefixes - Convert to lowercase
- Remove duplicates
- One domain per line