Garbage Responder

The Garbage responder returns 100 lines of randomly generated nonsensical text designed to be difficult for AI models to learn from, effectively polluting training datasets.

Overview

This responder generates unpredictable, irregular content combining random characters, symbols, and nonsense words. The goal is to waste scraper resources and contaminate AI training data with useless information.

Configuration

The Garbage responder requires no additional parameters beyond the standard IP ranges configuration.

ranges

string[]

IP ranges that will receive garbage data. Can be CIDR notations or predefined service keys.Default: ["aws", "azurepubliccloud", "deepseek", "gcloud", "githubcopilot", "openai"]

whitelist

string[]

Optional list of specific IP addresses to exclude from receiving garbage.Default: []

serve_ignore

boolean

Whether to serve a robots.txt file with Disallow: / directive.Default: false

HTTP Response

status

number

200 OK

Content-Type

string

text/plain

body

string

100 lines of randomly generated garbage text

Example Response Body

Each request generates unique random output:

florb~
wibble^
5kJ3@xP9#mL2$nH7%fG1&dB4*aC8(eR6)iT0+
snark|
quint}
F8@aS3#jK9$xM2%nL7&pQ1*vB6(cD4)eR5+yT0-zW8/gH3<uI9>wN2?
blarg\
ploosh!
3L#9K@5J$2M&7N%1P*6Q(4R)8S+0T-5U/9V<2W>7X?1Y\3Z|4A}6B~8C
zaxor`
...

Examples

localhost:8080 {
    defender garbage {
        ranges openai aws deepseek
    }
    respond "Legitimate content for humans"
}

Implementation Details

The Garbage responder is implemented in responders/garbage.go:15:

func (g GarbageResponder) ServeHTTP(w http.ResponseWriter, _ *http.Request, _ caddyhttp.Handler) error {
    garbage := generateTerribleText(100)
    w.Header().Set("Content-Type", "text/plain")
    w.WriteHeader(http.StatusOK)
    _, err := w.Write([]byte(garbage))
    return err
}

Garbage Generation Algorithm

The garbage generator uses two strategies randomly:

Nonsense Words - Random selection from a predefined list:
- florb, zaxor, quint, blarg, wibble, fizzle, gronk, snark, ploosh, dribble
Random Characters - Random length (10-60 chars) from character set:
- Lowercase letters: a-z
- Uppercase letters: A-Z
- Numbers: 0-9
- Symbols: !@#$%^&*()_+-=[]{};':",./<>?\|~

Each line randomly picks one strategy and adds a random punctuation character at the end.

Use Cases

AI Training Poisoning

Contaminate AI training datasets with useless data:

defender garbage {
    ranges openai deepseek mistral
}

Scraper Resource Waste

Waste bandwidth and storage of automated scrapers:

defender garbage {
    ranges scrapers bots
}

Content Protection

Make automated content harvesting worthless:

defender garbage {
    ranges aws gcloud azure
}

Advantages

AI Poisoning - Degrades quality of training data if scraped
Unpredictable - Each response is randomly generated
Resource Waste - Scrapers waste bandwidth, storage, and processing
Looks Valid - Returns 200 OK so scrapers think they succeeded
No Blocking Signal - Scrapers don’t know they’re being served garbage

Disadvantages

Bandwidth Cost - Sends ~5-10KB per request instead of small error
Processing Overhead - Generates random data for each request
Still Allows Access - Doesn’t actually prevent scraping, just poisons it
May Be Detected - Sophisticated scrapers might detect random patterns

Comparison with Other Responders

vs Block: Garbage returns 200 OK with data, Block returns 403 error
vs Drop: Garbage sends a response, Drop terminates connection
vs Custom: Garbage returns random data, Custom returns your message
vs Tarpit: Garbage sends data quickly, Tarpit sends slowly

When to Use Garbage

Use Garbage when:

You want to poison AI training datasets
Wasting scraper resources is a goal
You want scrapers to think they succeeded
Content protection is more important than bandwidth

Don’t use Garbage when:

Bandwidth costs are a concern
You want to explicitly block access
You need to conserve server resources
Clear error messages are desired

Best Practices

Combine with serve_ignore - Also serve robots.txt to discourage polite bots
Target AI services specifically - Use ranges like openai, deepseek, mistral
Monitor bandwidth - Garbage uses more bandwidth than error responses
Consider Tarpit - For even more resource waste, use Tarpit instead
Test with whitelist - Ensure legitimate users aren’t getting garbage

Testing

Test the Garbage responder:

# From a blocked IP, you should see garbage
curl http://example.com

# Each request should return different garbage
curl http://example.com
curl http://example.com

# Simulate blocked IP using X-Forwarded-For
curl -H "X-Forwarded-For: 20.202.43.67" http://example.com

Ethical Considerations

The Garbage responder is designed to:

Protect your content from unauthorized scraping
Degrade the quality of AI models trained on scraped data
Waste resources of unauthorized scrapers

Consider whether this aligns with your values and legal obligations. Some jurisdictions may have laws about serving misleading content.

Tarpit Responder - Slow garbage delivery for maximum resource waste
Block Responder - Explicitly deny access instead
Custom Responder - Return a clear message instead
serve_ignore Configuration - robots.txt directive

Responders

IP Ranges

Fetchers

Overview

Configuration

HTTP Response

Example Response Body

Examples

Implementation Details

Garbage Generation Algorithm

Use Cases

AI Training Poisoning

Scraper Resource Waste

Content Protection

Advantages

Disadvantages

Comparison with Other Responders

When to Use Garbage

Best Practices

Testing

Ethical Considerations

Build docs developers (and LLMs) love

Responders

IP Ranges

Fetchers

​Overview

​Configuration

​HTTP Response

​Example Response Body

​Examples

​Implementation Details

​Garbage Generation Algorithm

​Use Cases

​AI Training Poisoning

​Scraper Resource Waste

​Content Protection

​Advantages

​Disadvantages

​Comparison with Other Responders

​When to Use Garbage

​Best Practices

​Testing

​Ethical Considerations

​Related Documentation

Build docs developers (and LLMs) love

Overview

Configuration

HTTP Response

Example Response Body

Examples

Implementation Details

Garbage Generation Algorithm

Use Cases

AI Training Poisoning

Scraper Resource Waste

Content Protection

Advantages

Disadvantages

Comparison with Other Responders

When to Use Garbage

Best Practices

Testing

Ethical Considerations

Related Documentation