Skip to main content

Connector API Reference

Complete API reference for the Connector interface and related types.

Connector Interface

All connectors implement this interface:
export type Connector = {
  sourceId: string;
  sources: () => AsyncGenerator<Document>;
  ingestWhen?: 'never' | 'contentChanged' | 'expired';
  expiresAfter?: number;
};
Location: /home/daytona/workspace/source/packages/retrieval/src/lib/connectors/connector.ts:1-28

Properties

sourceId

Type: string Required: Yes Unique identifier for the logical source (group of documents) used in the embedding store.
const connector = github.file('facebook/react/README.md');
console.log(connector.sourceId);
// "github:file:facebook/react/README.md"
Format: Convention is {source-type}:{identifier}:
  • github:file:{path} - GitHub file
  • github:releases:{repo} - GitHub releases
  • github:repo:{url} - GitHub repository
  • rss:{url} - RSS feed
  • glob:{pattern} - Local files
  • pdf:{pattern} - PDF files
  • linear:workspace - Linear issues

sources

Type: () => AsyncGenerator<Document> Required: Yes Async generator function that yields documents to ingest.
sources: async function* () {
  yield {
    id: 'doc-1',
    content: async () => 'Document content',
    metadata: { author: 'John' },
  };
  yield {
    id: 'doc-2',
    content: async () => 'Another document',
  };
}

ingestWhen

Type: 'never' | 'contentChanged' | 'expired' Required: No Default: 'contentChanged' Controls ingestion behavior: ‘never’ Perform ingestion only if the source does NOT yet exist. Once created, never re-ingest.
const connector = local('**/*.md', {
  ingestWhen: 'never',
});
‘contentChanged’ (default) Run ingestion. Underlying pipeline will skip unchanged documents via content hashing.
const connector = local('**/*.md', {
  ingestWhen: 'contentChanged',
});
‘expired’ Only ingest if source doesn’t exist OR is expired.
const connector = local('**/*.md', {
  ingestWhen: 'expired',
  expiresAfter: 24 * 60 * 60 * 1000,
});

expiresAfter

Type: number Required: No Unit: Milliseconds Optional expiry duration in milliseconds from now. When set, the source will be considered expired after this duration.
const connector = rss('https://example.com/feed', {
  ingestWhen: 'expired',
  expiresAfter: 60 * 60 * 1000, // 1 hour
});

Document Type

Documents yielded by sources():
type Document = {
  id: string;
  content: () => Promise<string>;
  metadata?: Record<string, unknown>;
};

id

Type: string Required: Yes Unique document identifier within the source.
{
  id: 'facebook/react/README.md',
  // ...
}

content

Type: () => Promise<string> Required: Yes Async function that returns document content.
{
  id: 'doc-1',
  content: async () => {
    const text = await fetch(url).then(r => r.text());
    return text;
  },
}

metadata

Type: Record<string, unknown> Required: No Arbitrary per-document metadata stored with embeddings.
{
  id: 'article-1',
  content: async () => articleText,
  metadata: {
    author: 'John Doe',
    date: '2024-01-01',
    category: 'tech',
  },
}

Creating Custom Connectors

Basic Example

import type { Connector } from '@deepagents/retrieval/connectors';

export function customConnector(url: string): Connector {
  return {
    sourceId: `custom:${url}`,
    
    sources: async function* () {
      const response = await fetch(url);
      const data = await response.json();
      
      for (const item of data.items) {
        yield {
          id: item.id,
          content: async () => item.text,
          metadata: {
            title: item.title,
            date: item.date,
          },
        };
      }
    },
    
    ingestWhen: 'contentChanged',
  };
}

With Expiry

export function newsConnector(apiUrl: string): Connector {
  return {
    sourceId: `news:${apiUrl}`,
    
    sources: async function* () {
      const articles = await fetchArticles(apiUrl);
      
      for (const article of articles) {
        yield {
          id: article.slug,
          content: async () => article.content,
          metadata: {
            title: article.title,
            published: article.publishedAt,
          },
        };
      }
    },
    
    ingestWhen: 'expired',
    expiresAfter: 60 * 60 * 1000, // 1 hour
  };
}

With Error Handling

export function robustConnector(source: string): Connector {
  return {
    sourceId: `robust:${source}`,
    
    sources: async function* () {
      try {
        const items = await fetchItems(source);
        
        for (const item of items) {
          yield {
            id: item.id,
            content: async () => {
              try {
                return await fetchContent(item.url);
              } catch (error) {
                console.error(`Failed to fetch ${item.id}:`, error);
                return ''; // Return empty on error
              }
            },
          };
        }
      } catch (error) {
        console.error('Failed to fetch items:', error);
        // Generator completes with no items
      }
    },
  };
}

Built-in Connectors

github.file()

function file(filePath: string): Connector
Fetch a single file from GitHub. Example:
import { github } from '@deepagents/retrieval/connectors';

const connector = github.file('facebook/react/README.md');

github.release()

function release(
  repo: string,
  options?: ReleaseFetchOptions
): Connector
Fetch GitHub release notes. Example:
const connector = github.release('facebook/react', {
  untilTag: 'v18.0.0',
  inclusive: true,
});

github.repo()

function repo(
  repoUrl: string,
  opts: {
    includes: string[];
    excludes?: string[];
    branch?: string;
    includeGitignored?: boolean;
    includeSubmodules?: boolean;
    githubToken?: string;
    ingestWhen?: 'never' | 'contentChanged';
  }
): Connector
Ingest entire repository using gitingest. Example:
const connector = github.repo(
  'https://github.com/vercel/next.js',
  {
    includes: ['**/*.md'],
    branch: 'canary',
  }
);

rss()

function rss(
  feedUrl: string,
  options?: {
    maxItems?: number;
    fetchFullArticles?: boolean;
  }
): Connector
Fetch RSS/Atom feed items. Example:
import { rss } from '@deepagents/retrieval/connectors';

const connector = rss('https://hnrss.org/frontpage', {
  maxItems: 10,
  fetchFullArticles: true,
});

local()

function local(
  pattern: string,
  options?: {
    ingestWhen?: 'never' | 'contentChanged' | 'expired';
    expiresAfter?: number;
    cwd?: string;
  }
): Connector
Match local files using glob patterns. Example:
import { local } from '@deepagents/retrieval/connectors';

const connector = local('docs/**/*.md', {
  cwd: process.cwd(),
});

pdf()

function pdf(pattern: string): Connector
Match PDF files using glob patterns. Example:
import { pdf } from '@deepagents/retrieval/connectors';

const connector = pdf('research/**/*.pdf');

pdfFile()

function pdfFile(source: string): Connector
Fetch a single PDF from file path or URL. Example:
import { pdfFile } from '@deepagents/retrieval/connectors';

const connector = pdfFile('./manual.pdf');
// or
const connector2 = pdfFile('https://example.com/paper.pdf');

linear()

function linear(apiKey: string): Connector
Fetch Linear issues assigned to user. Example:
import { linear } from '@deepagents/retrieval/connectors';

const connector = linear(process.env.LINEAR_API_KEY!);

Best Practices

Unique Source IDs

Ensure source IDs uniquely identify content:
// Good
sourceId: `github:file:${owner}/${repo}/${path}`

// Bad - may conflict
sourceId: 'github-file'

Error Handling in Content

Handle errors gracefully in content() function:
content: async () => {
  try {
    return await fetchContent();
  } catch (error) {
    console.error('Content fetch failed:', error);
    return ''; // Return empty, don't throw
  }
}

Lazy Content Loading

Use async functions to defer content loading:
// Good - content loaded only when needed
content: async () => await fetch(url).then(r => r.text())

// Bad - content loaded immediately
content: async () => alreadyLoadedContent

Meaningful Metadata

Include metadata that aids filtering or display:
metadata: {
  type: 'article',
  published: new Date().toISOString(),
  author: 'John Doe',
  tags: ['tech', 'tutorial'],
}

Next Steps

Store API

Store interface reference

Core API

ingest() and similaritySearch() reference

Connectors Guide

Learn about connectors

Build docs developers (and LLMs) love