Overview
Crawlith implements multiple layers of security to prevent Server-Side Request Forgery (SSRF) attacks, protect internal networks, and ensure safe operation in production environments.SSRF Protection
IP Guard System
TheIPGuard class (from ipGuard.ts:9) prevents requests to internal/private IP addresses:
Blocked IPv4 Ranges:
127.0.0.0/8- Loopback10.0.0.0/8- Private network192.168.0.0/16- Private network172.16.0.0/12- Private network169.254.0.0/16- Link-local0.0.0.0/8- Unspecified
::1- Loopbackfc00::/7- Unique Local Address (ULA)fe80::/10- Link-local- IPv4-mapped IPv6 (e.g.,
::ffff:10.0.0.1)
Two-Layer IP Validation
Fromfetcher.ts:88:
http://127.0.0.1
Layer 2: Custom DNS lookup function validates resolved IPs before connection
From ipGuard.ts:93:
SSRF Attack Prevention
Blocked:Domain Filtering
Whitelist (Allow List)
Restrict crawling to specific domains:domainFilter.ts:29:
- If
--allowis specified, only listed domains are crawled - All other domains return status:
blocked_by_domain_filter - Useful for restricting crawls to trusted domains
Blacklist (Deny List)
Exclude specific domains:- Skip API endpoints
- Avoid admin panels
- Exclude third-party tracking domains
Domain Normalization
Domains are normalized before filtering:Example.Com→example.comexample.com.→example.com例え.jp→xn--r8jz45g.jp(Punycode)
Subdomain Policy
Control whether subdomains are included in the crawl scope.Include Subdomains
subdomainPolicy.ts:17:
example.com✓www.example.com✓blog.example.com✓api.staging.example.com✓
notexample.com✗exampleXcom✗
Exclude Subdomains (Default)
example.com✓
www.example.com→blocked_subdomainblog.example.com→blocked_subdomain
Subdomain + Whitelist
Combine subdomain policy with explicit whitelist:example.com✓www.example.com✓ (subdomain)cdn.example.net✓ (explicit whitelist)
api.example.net✗ (not in whitelist)
Scope Manager
TheScopeManager (from scopeManager.ts:13) combines all security policies:
- Denied domains (blacklist) - highest priority
- Explicit allowed domains (whitelist)
- Subdomain policy
Redirect Safety
Redirects are validated at each hop: Fromfetcher.ts:158:
- Redirect loops (detected by
RedirectController) - Redirect to internal IPs (validated by
IPGuard) - Redirect to blocked domains (validated by
ScopeManager) - Redirect limit exceeded (default: 2 hops, max: 11)
Redirect Configuration
redirect_loop- Circular redirect detectedredirect_limit_exceeded- Too many redirect hops
redirectController.ts:17:
Response Size Limiting
Protect against memory exhaustion from large responses:responseLimiter.ts:4:
2000000 bytes (2 MB)
Result for oversized responses:
Proxy Support
Route requests through a proxy server:proxyAdapter.ts:6:
proxy_connection_failed- Cannot connect to proxy
Security Best Practices
Should I use --allow or --deny?
Should I use --allow or --deny?
Use
--allow (whitelist) when:- Crawling untrusted user-provided URLs
- Running in production environments
- Security is critical
--deny (blacklist) when:- You control the start URL
- Need to exclude specific subdomains
- Flexibility is more important than strict security
Can SSRF protection be disabled?
Can SSRF protection be disabled?
No. SSRF protection is always active and cannot be disabled. This is a core security feature to prevent attacks on internal networks.If you need to crawl local development servers, use public-facing URLs or deploy Crawlith on the same network segment.
How do I safely crawl multi-domain sites?
How do I safely crawl multi-domain sites?
Use explicit whitelisting:This ensures only trusted domains are crawled, even if malicious links are discovered.
What happens when a redirect goes to a blocked domain?
What happens when a redirect goes to a blocked domain?
The redirect is blocked and recorded:The source page is recorded with a 301 status, but the target is not fetched.
Security Error Statuses
All security errors are recorded in crawl results:| Status | Cause | Fix |
|---|---|---|
blocked_internal_ip | SSRF protection triggered | Don’t crawl internal IPs |
blocked_by_domain_filter | Failed domain whitelist/blacklist | Update --allow or --deny |
blocked_subdomain | Subdomain not allowed | Add --include-subdomains |
proxy_connection_failed | Cannot connect to proxy | Verify proxy URL and credentials |
redirect_loop | Circular redirect | Check site configuration |
redirect_limit_exceeded | Too many redirects | Increase --max-redirects or fix site |
oversized | Response exceeds --max-bytes | Increase limit or skip large resources |
Monitoring Security Events
Enable debug logging to see security decisions:- Blocked URLs with reasons
- Redirect chain validation
- DNS resolution and IP checks
Related Topics
- Configuration - Command-line options
- Rate Limiting - Respectful crawling
- Troubleshooting - Debugging security blocks
Technical Details
Source Files
plugins/core/src/core/security/ipGuard.ts- SSRF protectionplugins/core/src/core/scope/domainFilter.ts- Whitelist/blacklistplugins/core/src/core/scope/subdomainPolicy.ts- Subdomain controlplugins/core/src/core/scope/scopeManager.ts- Unified scope validationplugins/core/src/core/network/redirectController.ts- Redirect safetyplugins/core/src/core/network/responseLimiter.ts- Response size limitingplugins/core/src/core/network/proxyAdapter.ts- Proxy support