Skip to main content

Connectors

Connectors provide a unified interface for ingesting content from different sources. Each connector implements the Connector interface and handles source-specific details.

Available Connectors

GitHub

GitHub files, releases, and repositories

RSS Feeds

RSS/Atom feeds with article extraction

Local Files

Local files with glob patterns

PDF Documents

PDF files and documents

Linear Issues

Linear workspace issues

Connector Interface

All connectors implement this interface:
export type Connector = {
  /** Unique identifier for the source */
  sourceId: string;
  
  /** Async generator yielding documents */
  sources: () => AsyncGenerator<{
    id: string;
    content: () => Promise<string>;
    metadata?: Record<string, unknown>;
  }>;
  
  /** Ingestion strategy */
  ingestWhen?: 'never' | 'contentChanged' | 'expired';
  
  /** Expiry duration in milliseconds */
  expiresAfter?: number;
};

Source ID

Each connector has a unique sourceId that identifies the content source:
const connector = github.file('facebook/react/README.md');
console.log(connector.sourceId);
// "github:file:facebook/react/README.md"
Source IDs are used to:
  • Track which content has been ingested
  • Avoid duplicate ingestion
  • Query specific sources during search

Document Generator

The sources() method returns an async generator that yields documents:
for await (const doc of connector.sources()) {
  console.log(doc.id);           // Document identifier
  const text = await doc.content(); // Document content
  console.log(doc.metadata);     // Optional metadata
}
Each document includes:
  • id - Unique document identifier
  • content - Function returning document text
  • metadata - Optional key-value metadata

Ingestion Strategy

Connectors can specify when to ingest:
const connector = local('**/*.md', {
  ingestWhen: 'contentChanged', // Default
});
contentChanged (default) Always attempt ingestion. Skip unchanged documents via content hashing. never Only ingest if the source has never been ingested. expired Only ingest if the source doesn’t exist or has expired.

Expiry

Set expiration time for cached content:
const connector = rss('https://example.com/feed', {
  ingestWhen: 'expired',
  expiresAfter: 24 * 60 * 60 * 1000, // 24 hours
});

Using Connectors

Import Connectors

import { github, rss, local, pdf, linear } from '@deepagents/retrieval/connectors';

With Ingestion

import { ingest, fastembed, SqliteStore } from '@deepagents/retrieval';
import { github } from '@deepagents/retrieval/connectors';
import Database from 'better-sqlite3';

const db = new Database('./vectors.db');
const store = new SqliteStore(db, 384);
const embedder = fastembed();

await ingest({
  connector: github.file('facebook/react/README.md'),
  store,
  embedder,
});
import { similaritySearch } from '@deepagents/retrieval';

const results = await similaritySearch('How do I install React?', {
  connector: github.file('facebook/react/README.md'),
  store,
  embedder,
});

Connector Examples

GitHub File

import { github } from '@deepagents/retrieval/connectors';

const connector = github.file('microsoft/TypeScript/README.md');

GitHub Releases

const connector = github.release('facebook/react', {
  untilTag: 'v18.0.0',
  inclusive: true,
});

GitHub Repository

const connector = github.repo('https://github.com/vercel/next.js', {
  includes: ['**/*.md'],
  excludes: ['**/node_modules/**'],
  branch: 'canary',
});

RSS Feed

import { rss } from '@deepagents/retrieval/connectors';

const connector = rss('https://hnrss.org/frontpage', {
  maxItems: 10,
  fetchFullArticles: true,
});

Local Files

import { local } from '@deepagents/retrieval/connectors';

const connector = local('docs/**/*.md', {
  ingestWhen: 'contentChanged',
  cwd: process.cwd(),
});

PDF Files

import { pdf, pdfFile } from '@deepagents/retrieval/connectors';

// Glob pattern
const connector1 = pdf('research/**/*.pdf');

// Single file
const connector2 = pdfFile('./manual.pdf');

// From URL
const connector3 = pdfFile('https://example.com/paper.pdf');

Linear Issues

import { linear } from '@deepagents/retrieval/connectors';

const connector = linear('your-api-key');

Custom Connectors

Create your own connector by implementing the interface:
import type { Connector } from '@deepagents/retrieval/connectors';

export function customConnector(url: string): Connector {
  return {
    sourceId: `custom:${url}`,
    
    sources: async function* () {
      // Fetch your data
      const response = await fetch(url);
      const data = await response.json();
      
      // Yield documents
      for (const item of data.items) {
        yield {
          id: item.id,
          content: async () => item.text,
          metadata: { author: item.author },
        };
      }
    },
    
    ingestWhen: 'contentChanged',
  };
}

Using Custom Connectors

import { ingest } from '@deepagents/retrieval';

await ingest({
  connector: customConnector('https://api.example.com/data'),
  store,
  embedder,
});

Multiple Connectors

Ingest from multiple sources:
const connectors = [
  github.file('facebook/react/README.md'),
  local('docs/**/*.md'),
  rss('https://blog.example.com/feed'),
];

for (const connector of connectors) {
  await ingest({ connector, store, embedder });
}
Each connector maintains its own sourceId for independent tracking.

Best Practices

Use Descriptive Source IDs Source IDs should clearly identify the content source. Handle Errors in Content Functions The content() function should handle errors gracefully:
content: () => readFile(path, 'utf8').catch(() => '')
Include Useful Metadata Add metadata that helps with filtering or context:
metadata: {
  author: 'John Doe',
  date: '2024-01-01',
  category: 'tutorial',
}
Choose Appropriate Ingestion Strategies Use never for static content, contentChanged for dynamic content, and expired for time-sensitive content.

Next Steps

GitHub Connector

Detailed GitHub connector guide

RSS Connector

RSS feed ingestion

Local Files

Work with local files

API Reference

Connector API documentation

Build docs developers (and LLMs) love