Skip to main content

Overview

Collection of Python utilities for automating common tasks including Git folder downloads, email scanning, and broken link detection. All scripts are tested on Ubuntu 22.04/24.04.

Prerequisites

Install Python 3

sudo apt update
sudo apt install -y python3 python3-pip python-is-python3

Common Dependencies

Most scripts require additional Python packages. Install them as needed:
pip3 install requests beautifulsoup4

Available Scripts

Email Scan

Scan files and replace email addresses and URLs. Download:
wget https://raw.githubusercontent.com/maravento/vault/master/scripts/python/emailscan.py
chmod +x emailscan.py
Usage:
python emailscan.py
Features:
  • Scan files for email addresses
  • Replace BASE_URL in files
  • Replace TARGET_EMAIL in files
  • Batch processing
  • Recursive directory scanning
Configuration: Edit the script to set your base URL and target email:
BASE_URL = "https://example.com"
TARGET_EMAIL = "[email protected]"
Example:
# Scan current directory
python emailscan.py

# The script will:
# 1. Find all text files
# 2. Replace email addresses with TARGET_EMAIL
# 3. Replace URLs with BASE_URL
# 4. Create backup of modified files
# Edit configuration in script
nano emailscan.py

# Run scanner
python emailscan.py

Git Folder Download

Replaces Subversion (SVN) for downloading specific GitHub/GitLab folders Download specific folders from GitHub/GitLab repositories without cloning the entire repository. Download:
wget https://raw.githubusercontent.com/maravento/vault/master/scripts/python/gitfolder.py
chmod +x gitfolder.py
Usage:
python gitfolder.py <GITHUB_OR_GITLAB_URL>
Features:
  • Download specific repository folders
  • Support for GitHub and GitLab
  • Replaces deprecated SVN method
  • Preserves directory structure
  • Progress indication
Examples:
# Download entire scripts folder from maravento/vault
python gitfolder.py https://github.com/maravento/vault/scripts

# Download specific subfolder
python gitfolder.py https://github.com/maravento/vault/scripts/bash
Why Use This Instead of SVN?
GitHub and GitLab have deprecated Subversion (SVN) support. This Python script provides the same functionality using the native Git API.
Previous SVN Method (Deprecated):
# This no longer works
svn export https://github.com/user/repo/trunk/folder
New Method with gitfolder.py:
# This works perfectly
python gitfolder.py https://github.com/user/repo/folder

Scan websites for broken links. Download:
wget https://raw.githubusercontent.com/maravento/vault/master/scripts/python/linkcheck.py
chmod +x linkcheck.py
Usage:
python linkcheck.py <URL>
Features:
  • Scan websites for broken links
  • HTTP/HTTPS support
  • Recursive link checking
  • Detailed error reporting
  • Export results to file
Dependencies:
pip3 install requests beautifulsoup4
Examples:
# Check single website
python linkcheck.py https://example.com
Output Format:
Scanning: https://example.com

Broken Links Found:
[404] https://example.com/missing-page.html
[500] https://example.com/server-error
[TIMEOUT] https://slow-server.com/page

Working Links: 156
Broken Links: 3
Total Links Checked: 159
Configuration Options: You can modify the script to customize:
# Maximum depth for recursive scanning
MAX_DEPTH = 3

# Request timeout (seconds)
TIMEOUT = 10

# User agent string
USER_AGENT = "LinkChecker/1.0"

# Ignore external links
IGNORE_EXTERNAL = True

Installation Script

Quick installation of all Python utilities:
#!/bin/bash
# Install Python utilities

mkdir -p ~/scripts/python
cd ~/scripts/python

# Download all scripts
wget https://raw.githubusercontent.com/maravento/vault/master/scripts/python/emailscan.py
wget https://raw.githubusercontent.com/maravento/vault/master/scripts/python/gitfolder.py
wget https://raw.githubusercontent.com/maravento/vault/master/scripts/python/linkcheck.py

# Make executable
chmod +x *.py

# Install dependencies
pip3 install requests beautifulsoup4

echo "Python utilities installed successfully!"

Use Cases

Use Case: Update all email addresses and URLs in documentation
# Configure your settings
nano emailscan.py
# Set BASE_URL and TARGET_EMAIL

# Run on documentation directory
cd /path/to/docs
python ~/scripts/python/emailscan.py
Use Case: Download only the scripts folder from a large repository
# Instead of cloning 500MB repository
# Download only 5MB scripts folder
python gitfolder.py https://github.com/user/large-repo/scripts
Benefits:
  • Saves bandwidth
  • Faster downloads
  • No need for full git clone
Use Case: Regular link checking for website maintenance
# Create cron job for weekly checks
crontab -e

# Add:
# 0 2 * * 1 python /path/to/linkcheck.py https://example.com > /var/log/linkcheck.log

Advanced Usage

Combining Scripts

Use multiple utilities together:
#!/bin/bash
# Download, scan, and check links

# 1. Download documentation folder
python gitfolder.py https://github.com/project/docs/

# 2. Update email addresses
cd docs/
python emailscan.py

# 3. Check for broken links
python linkcheck.py https://your-docs-site.com

Automation Examples

# Add to crontab
crontab -e

# Daily link check at 2 AM
0 2 * * * /usr/bin/python3 /home/user/scripts/linkcheck.py https://example.com >> /var/log/linkcheck.log 2>&1

Troubleshooting

Common Issues

Solution:
pip3 install requests
# Or
sudo apt install python3-requests
Solution:
chmod +x script.py
# Or run with python directly
python3 script.py
Solution:
# Update CA certificates
sudo apt update
sudo apt install ca-certificates

# Or use --no-verify option if available

Requirements

System Requirements

  • Ubuntu 22.04 or 24.04 (or compatible Linux distribution)
  • Python 3.8 or higher
  • Internet connection

Python Packages

pip3 install requests beautifulsoup4 lxml

License

Always review script contents and test in a safe environment before running on production systems.

Build docs developers (and LLMs) love