RSS Feed Connector
The RSS connector ingests content from RSS and Atom feeds, with optional full article extraction using Mozilla Readability.Import
Basic Usage
Configuration
Feed URL
Any valid RSS or Atom feed URL:Max Items
Limit the number of items to ingest:Full Article Extraction
Fetch and extract full article content:- Fetches the article URL from each feed item
- Uses Mozilla Readability to extract main content
- Falls back to RSS content if extraction fails
- Significantly slower but provides complete content
Feed Parsing
The connector supports:- RSS 2.0 - Standard RSS format
- RSS 1.0 - RDF-based RSS
- Atom - Atom Syndication Format
Parsed Fields
Document Format
Each feed item is ingested as:Feed Information
A special document contains feed metadata:feed-info
Full Article Extraction
WhenfetchFullArticles: true:
How It Works
- Fetch HTML - Download article page
- Extract Content - Use Mozilla Readability
- Validate - Ensure content is substantial (>200 chars)
- Fallback - Use RSS content if extraction fails
Example
Readability Features
- Removes navigation, ads, and clutter
- Extracts main article text
- Preserves title and structure
- Works with most news sites and blogs
Error Handling
Extraction failures are logged but don’t stop ingestion:Source ID
Instructions
The connector includes AI agent instructions:Examples
Hacker News Feed
Blog with Full Articles
Multiple Feeds
Search Across Multiple Feeds
Metadata
Each document includes metadata:Performance Considerations
Without Full Articles
Fast ingestion using RSS content:With Full Articles
Slower due to article fetching:Timeout
Article fetches have a 10-second timeout:User Agent
Article requests use a custom user agent:Error Handling
Feed Parsing Errors
Article Extraction Errors
Logged as warnings, don’t stop ingestion:Caching Strategy
Use expiry for time-sensitive content:Content Validation
Full articles must be >200 characters:Best Practices
Limit Items for Full Extraction Full article extraction is slow. Limit items:Next Steps
GitHub Connector
Ingest from GitHub
Local Files
Work with local files
Search
Search ingested content