Overview
This responder generates unpredictable, irregular content combining random characters, symbols, and nonsense words. The goal is to waste scraper resources and contaminate AI training data with useless information.Configuration
The Garbage responder requires no additional parameters beyond the standard IP ranges configuration.IP ranges that will receive garbage data. Can be CIDR notations or predefined service keys.Default:
["aws", "azurepubliccloud", "deepseek", "gcloud", "githubcopilot", "openai"]Optional list of specific IP addresses to exclude from receiving garbage.Default:
[]Whether to serve a robots.txt file with
Disallow: / directive.Default: falseHTTP Response
200 OK
text/plain
100 lines of randomly generated garbage text
Example Response Body
Each request generates unique random output:Examples
Implementation Details
The Garbage responder is implemented inresponders/garbage.go:15:
Garbage Generation Algorithm
The garbage generator uses two strategies randomly:-
Nonsense Words - Random selection from a predefined list:
florb,zaxor,quint,blarg,wibble,fizzle,gronk,snark,ploosh,dribble
-
Random Characters - Random length (10-60 chars) from character set:
- Lowercase letters:
a-z - Uppercase letters:
A-Z - Numbers:
0-9 - Symbols:
!@#$%^&*()_+-=[]{};':",./<>?\|~
- Lowercase letters:
Use Cases
AI Training Poisoning
Contaminate AI training datasets with useless data:Scraper Resource Waste
Waste bandwidth and storage of automated scrapers:Content Protection
Make automated content harvesting worthless:Advantages
- AI Poisoning - Degrades quality of training data if scraped
- Unpredictable - Each response is randomly generated
- Resource Waste - Scrapers waste bandwidth, storage, and processing
- Looks Valid - Returns 200 OK so scrapers think they succeeded
- No Blocking Signal - Scrapers don’t know they’re being served garbage
Disadvantages
- Bandwidth Cost - Sends ~5-10KB per request instead of small error
- Processing Overhead - Generates random data for each request
- Still Allows Access - Doesn’t actually prevent scraping, just poisons it
- May Be Detected - Sophisticated scrapers might detect random patterns
Comparison with Other Responders
- vs Block: Garbage returns 200 OK with data, Block returns 403 error
- vs Drop: Garbage sends a response, Drop terminates connection
- vs Custom: Garbage returns random data, Custom returns your message
- vs Tarpit: Garbage sends data quickly, Tarpit sends slowly
When to Use Garbage
Use Garbage when:- You want to poison AI training datasets
- Wasting scraper resources is a goal
- You want scrapers to think they succeeded
- Content protection is more important than bandwidth
- Bandwidth costs are a concern
- You want to explicitly block access
- You need to conserve server resources
- Clear error messages are desired
Best Practices
- Combine with serve_ignore - Also serve robots.txt to discourage polite bots
- Target AI services specifically - Use ranges like
openai,deepseek,mistral - Monitor bandwidth - Garbage uses more bandwidth than error responses
- Consider Tarpit - For even more resource waste, use Tarpit instead
- Test with whitelist - Ensure legitimate users aren’t getting garbage
Testing
Test the Garbage responder:Ethical Considerations
The Garbage responder is designed to:- Protect your content from unauthorized scraping
- Degrade the quality of AI models trained on scraped data
- Waste resources of unauthorized scrapers
Related Documentation
- Tarpit Responder - Slow garbage delivery for maximum resource waste
- Block Responder - Explicitly deny access instead
- Custom Responder - Return a clear message instead
- serve_ignore Configuration - robots.txt directive