Downloading and Running
The simplest way to run the BlackWeb update is with this one-liner:
wget -q -N https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/bwupdate.sh && chmod +x bwupdate.sh && ./bwupdate.sh
This command:
Downloads the latest bwupdate.sh script
Makes it executable
Runs the script
Do not run this script as root. The script will request elevated privileges when needed for Squid operations.
Script Initialization
User Verification
The script first checks that you’re not running as root:
# check no-root
if [ "$( id -u )" == "0" ]; then
echo "❌ This script should not be run as root"
exit 1
fi
System Compatibility Check
# check SO
UBUNTU_VERSION = $( lsb_release -rs )
UBUNTU_ID = $( lsb_release -is | tr '[:upper:]' '[:lower:]' )
if [[ " $UBUNTU_ID " != "ubuntu" || " $UBUNTU_VERSION " != "24.04" ]]; then
echo "This script requires Ubuntu 24.04. Use at your own risk"
# exit 1
fi
Working Directory Setup
# VARIABLES
bwupdate = "$( pwd )/bwupdate"
wgetd = "wget -q -c --show-progress --no-check-certificate --retry-connrefused --timeout=10 --tries=4"
# PATH_TO_ACL (Change it to the directory of your preference)
route = "/etc/acl"
# CREATE PATH
if [ ! -d " $route " ]; then sudo mkdir -p " $route " ; fi
The default installation path is /etc/acl. Modify the route variable to change this location.
Update Workflow
Phase 1: Download Blackweb Repository
The script uses a Python helper to download the bwupdate folder from GitHub:
# DOWNLOAD BLACKWEB
echo "${ bw02 [ $lang ]}"
$wgetd https://raw.githubusercontent.com/maravento/vault/master/scripts/python/gitfolder.py -O gitfolder.py
chmod +x gitfolder.py
python gitfolder.py https://github.com/maravento/blackweb/bwupdate
Phase 2: Download Blocklists
The script downloads from multiple public blocklist sources:
blurls () {
local url = " $1 "
local filename target i
local user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36"
filename = $( basename "${ url %% \? * }" | sed 's/[^a-zA-Z0-9._-]/_/g' )
target = "bwtmp/ $filename "
# incremental suffix
i = 1
while [ -e " $target " ]; do
target = "bwtmp/${ filename % . * }_ $i .${ filename ##* .}"
(( i ++ ))
done
# check with curl
if ! curl -k -s -f -I -L -A " $user_agent " --connect-timeout 5 --retry 1 " $url " > /dev/null 2>&1 ; then
echo "❌ URL Down: $url "
return 1
fi
# download with curl
echo -n " $target ... "
if curl -k -L -s \
--connect-timeout 10 --retry 3 \
--user-agent " $user_agent " \
" $url " -o " $target " ; then
echo "✅"
else
echo "❌ ERROR"
return 1
fi
}
This function:
Sanitizes filenames from URLs
Handles duplicate filenames with incremental suffixes
Verifies URL availability before downloading
Uses retry logic for reliability
Provides visual feedback (✅/❌) for each download
Phase 3: Download Allowlists
Downloads university domains and other allowlisted domains:
univ () {
local url = " $1 "
# check with curl
if ! curl -k -s -f -I --connect-timeout 5 --retry 1 " $url " > /dev/null ; then
echo "❌ URL Down: $url "
return 1
fi
# download
$wgetd " $url " -O - \
| grep -oiE "([a-zA-Z0-9][a-zA-Z0-9-]{1,61}\.){1,}(\.?[a-zA-Z]{2,}){1,}" \
| grep -Pvi '(.htm(l)?|.the|.php(il)?)$' \
| sed -r 's:(^\.*.?(www|ftp|xxx|wvw)[^.]*?\.|^\.\.?)::gi' \
| awk '{if ($1 !~ /^\./) print "." $1; else print $1}' \
| sort -u >> lst/debugwl.txt
}
univ 'https://raw.githubusercontent.com/Hipo/university-domains-list/master/world_universities_and_domains.json'
Phase 4: Domain Capture and Processing
Extracts and normalizes domains from all downloaded files:
# CAPTURING DOMAINS
find bwtmp -type f -not -iname "*pdf" \
-execdir grep -oiE "([a-zA-Z0-9][a-zA-Z0-9-]{1,61}\.){1,}(\.?[a-zA-Z]{2,}){1,}" {} \; \
| sed -r 's:(^\.*.?(www|ftp|ftps|ftpes|sftp|pop|pop3|smtp|imap|http|https)[^.]*?\.|^\.\.?)::gi' \
| sed -r '/[^a-zA-Z0-9.-]/d; /^[^a-zA-Z0-9.]/d; /[^a-zA-Z0-9]$/d; /^[[:space:]]*$/d; /[[:space:]]/d; /^[[:space:]]*#/d; /\.{2,}/d' \
| sort -u > stage1
Phase 5: RFC 1035 Compliance
Removes hostnames exceeding 63 characters:
# RFC 1035 Partial
sed '/[^.]\{64\}/d' stage1 \
| grep -vP '[A-Z]' \
| grep -vP '(^|\.)-|-($|\.)' \
| sed 's/^\.//g' \
| sort -u > stage2
Phase 6: Join Lists
# JOIN AND UPDATE LIST
echo "${ bw06 [ $lang ]}"
sed '/^$/d; /#/d' lst/{debugwl,invalid}.txt | sed 's/[^[:print:]\n]//g' | sed 's/^[[:space:]]*//;s/[[:space:]]*$//' | awk '{if ($1 !~ /^\./) print "." $1; else print $1}' | sort -u > urls.txt
Phase 7: Debug Domains
Filters domains using the Python domfilter.py tool:
# DEBUGGING DOMAINS
echo "${ bw07 [ $lang ]}"
grep -Fvxf urls.txt capture.txt | sed 's/[^[:print:]\n]//g' | sed 's/^[[:space:]]*//;s/[[:space:]]*$//' | awk '{if ($1 !~ /^\./) print "." $1; else print $1}' | sort -u > cleancapture.txt
$wgetd https://raw.githubusercontent.com/maravento/vault/master/dofi/domfilter.py -O domfilter.py > /dev/null 2>&1
python domfilter.py --input cleancapture.txt
grep -Fvxf urls.txt output.txt | grep -P "^[\x00-\x7F]+$" | sort -u > finalclean
Phase 8 & 9: DNS Lookup (See DNS Lookup Page)
Performs two-step DNS validation (covered in detail on the DNS Lookup page ).
Phase 10: TLD Filtering
# TLD FINAL FILTER (Exclude AllowTLDs .gov, .mil, etc., delete TLDs and NO-ASCII lines
echo "${ bw11 [ $lang ]}"
regex_ext = $( grep -v '^#' lst/allowtlds.txt | sed 's/$/\$/' | tr '\n' '|' )
new_regex_ext = "${ regex_ext % |}"
grep -E -v " $new_regex_ext " blackweb_tmp | sort -u > blackweb_tmp2
comm -23 <( sort blackweb_tmp2) <( sort tlds.txt) > blackweb.txt
Phase 11: Squid Integration
# RELOAD SQUID-CACHE
echo "${ bw12 [ $lang ]}"
# copy blaclweb to path
sudo cp -f blackweb.txt " $route "/blackweb.txt > /dev/null 2>&1
# Squid Reload
sudo bash -c 'squid -k reconfigure' 2> sqerror && sleep 20
Progress Monitoring
The script provides real-time progress feedback:
Blackweb Project
This process can take. Be patient...
Downloading Blackweb...
Downloading Blocklists...
bwtmp/hosts.txt ... ✅
bwtmp/is.abp.txt ... ✅
...
Joining Lists...
Debugging Domains...
1st DNS Loockup...
Processed: 2463489 / 7244989 (34.00%)
Resuming Interrupted Execution
The script can resume from the DNS lookup phase if interrupted:
# CHECK DNSLOOKUP1
if [ ! -e " $bwupdate " /dnslookup1 ]; then
# DELETE OLD REPOSITORY
rm -rf " $bwupdate " > /dev/null 2>&1
# Start from beginning...
else
cd " $bwupdate "
# Resume from DNS lookup phase
fi
If you interrupt the script during DNS lookup (Ctrl+C), it will automatically resume from that point on the next run.
Cleanup
By default, the script removes the temporary bwupdate directory after completion:
# DELETE REPOSITORY (Optional)
cd ..
rm -rf " $bwupdate " > /dev/null 2>&1
Comment out these lines if you want to preserve intermediate files for debugging.
Completion
The script logs completion to syslog:
# END
sudo bash -c 'echo "BlackWeb Done: $(date)" | tee -a /var/log/syslog'
echo "${ bw13 [ $lang ]}"
Check for errors in SquidErrors.txt after completion.
Next Steps
DNS Lookup Process Understand the parallel DNS validation system
Debugging Features Learn about domain validation and cleaning processes