Researchers

Researchers can use Syft Space to create AI-powered knowledge bases from their papers, notes, and datasets, making years of research instantly queryable without exposing underlying data.

Why Syft Space for researchers

Private by default

Your research data never leaves your control. Share insights, not raw data.

Instant recall

Query years of papers and notes in seconds with natural language.

Collaborate safely

Let collaborators query your knowledge base without sharing raw files.

Preserve context

AI responses cite specific papers and sections, preserving academic rigor.

Use cases

Personal research assistant

Create a searchable AI assistant from all your research materials. What to index:

Published papers and preprints
Research notes and lab notebooks
Literature review collections
Conference presentations
Grant proposals and reports

Example queries:

“What methods did I use for analyzing protein structures in 2022?”
“Summarize all papers I’ve read about CRISPR gene editing”
“What were the key findings from the Smith et al. dataset?”

Benefits:

Never lose track of previous work
Quickly reference methodology from past papers
Onboard new lab members with instant knowledge access

Lab knowledge base

Make your entire lab’s research searchable for all team members. What to index:

All lab publications
Standard operating procedures
Equipment manuals and protocols
Meeting notes and decisions
Experimental results and datasets

Example queries:

“How do we calibrate the mass spectrometer?”
“What experiments have we done with this particular compound?”
“Summarize our findings on drug resistance mechanisms”

Benefits:

Reduce repetitive questions
Preserve institutional knowledge when people leave
Speed up literature reviews
Improve research reproducibility

Literature review assistant

Index relevant papers to create a domain-specific search engine. What to index:

Papers from your field
Key review articles
Relevant preprints
Books and chapters

Example queries:

“What are the main approaches to quantum error correction?”
“Compare the effectiveness of different vaccine platforms”
“What gaps exist in current climate models?”

Benefits:

Stay current with the literature
Find connections across papers
Speed up systematic reviews
Identify research gaps

Dataset querying

Make research datasets queryable without exposing raw data. What to index:

Dataset documentation
Analysis reports
Methodology papers
Data dictionaries

Example queries:

“What variables are available in the genomic dataset?”
“How was the patient cohort selected?”
“What preprocessing steps were applied?”

Benefits:

Share dataset insights with collaborators
Comply with data privacy requirements
Enable meta-analyses
Reduce data access requests

Getting started

Install Syft Space

Choose your installation method:

Desktop app
Lab server

Best for individual researchers:

Download for macOS or Linux
Install and launch
No server setup required

For shared lab resources:

docker run -d \
  --name syft-space \
  --restart unless-stopped \
  -p 8080:8080 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v syft-space-data:/data \
  ghcr.io/openmined/syft-space:latest

See the installation guide for details.

Create a dataset

Organize your research materials:

# Create a dataset for your papers
curl -X POST http://localhost:8080/api/v1/datasets/ \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-research",
    "dtype": "local_file",
    "configuration": {
      "httpPort": 8081,
      "grpcPort": 50051,
      "collectionName": "ResearchPapers",
      "ingestionPath": "~/research/papers"
    },
    "summary": "My research papers and notes"
  }'

Place your PDFs and documents in the ingestion path. Syft Space automatically indexes them.

Connect an AI model

Choose a model provider:

OpenAI
Anthropic
Ollama (local)

curl -X POST http://localhost:8080/api/v1/models/ \
  -H "Content-Type: application/json" \
  -d '{
    "name": "gpt-4",
    "dtype": "openai",
    "configuration": {
      "api_key": "sk-your-key",
      "model": "gpt-4-turbo"
    }
  }'

curl -X POST http://localhost:8080/api/v1/models/ \
  -H "Content-Type: application/json" \
  -d '{
    "name": "claude",
    "dtype": "openai",
    "configuration": {
      "api_key": "sk-ant-your-key",
      "base_url": "https://api.anthropic.com/v1",
      "model": "claude-3-opus-20240229"
    }
  }'

# Run Ollama locally
ollama pull llama3

# Connect to Syft Space
curl -X POST http://localhost:8080/api/v1/models/ \
  -H "Content-Type: application/json" \
  -d '{
    "name": "llama3-local",
    "dtype": "openai",
    "configuration": {
      "base_url": "http://localhost:11434/v1",
      "model": "llama3",
      "api_key": "not-needed"
    }
  }'

Create a query endpoint

curl -X POST http://localhost:8080/api/v1/endpoints/ \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Research Assistant",
    "slug": "my-research",
    "dataset_id": "<dataset-id>",
    "model_id": "<model-id>",
    "response_type": "both"
  }'

Start querying

Query your research:

curl -X POST http://localhost:8080/api/v1/endpoints/my-research/query \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What were my main findings on protein folding?"}
    ],
    "similarity_threshold": 0.7,
    "limit": 5
  }'

Or use the web interface at http://localhost:8080

Best practices

Organizing research materials

Create topic-specific datasets

Organize by research area or project:

protein-folding - Papers on protein structure
lab-protocols - SOPs and procedures
lit-review-2025 - Papers for current review
grant-materials - Proposals and reports

Include document metadata

Add context to help AI understand your materials:

Add frontmatter with dates, authors, keywords
Use descriptive filenames: smith_2024_crispr_methods.pdf
Include README files explaining dataset contents

Keep materials updated

Syft Space watches for new files:

Drop new papers in your ingestion folder
Automatic indexing happens in the background
Update existing files to refresh the index

Keep it private

Don’t publish endpoints if you want to keep research private. Query locally only.

Share with collaborators

Use access control policies to limit who can query your endpoint.

Public knowledge

Publish to SyftHub to share insights with the research community.

Rate limit queries

Prevent abuse by limiting queries per user or per day.

Query optimization

Be specific in queries

Better: “What methods did Smith et al. use for RNA sequencing?”Worse: “Tell me about RNA”

Adjust similarity threshold

Lower threshold (0.5-0.7): Broader search, more results
Higher threshold (0.8-0.9): Stricter matching, fewer results

Increase limit for comprehensive answers

Request more source documents for complex questions:

{
  "limit": 10,
  "similarity_threshold": 0.6
}

Review source citations

Always check which papers the AI is citing to verify accuracy.

Example: PhD researcher

Setup:

200 papers from literature review
50 personal research notes
10 published papers
3 lab protocols

Workflow:

Drop all PDFs in ~/research/papers/
Create one dataset: phd-research
Query before writing:
- “What have I found about X?”
- “Which papers support this claim?”
- “What methods are commonly used?”

Benefits:

Wrote thesis 30% faster
Never lost track of sources
Easily answered reviewer questions
Shared knowledge base with advisor

Example: Research lab

Setup:

1,000+ lab papers and protocols
5 years of meeting notes
Equipment manuals
Experimental procedures

Workflow:

Lab server accessible to all members
Everyone can query: “How do we…?”
New members onboard by querying the knowledge base
Reduce time spent answering repetitive questions

Benefits:

Onboarding time reduced by 50%
Institutional knowledge preserved
More time for actual research
Better protocol compliance

Advanced features

Multiple endpoints for different audiences

# Private endpoint for personal use
curl -X POST http://localhost:8080/api/v1/endpoints/ \
  -d '{"name": "Private Research", "slug": "private", ...}'

# Lab-only endpoint
curl -X POST http://localhost:8080/api/v1/endpoints/ \
  -d '{"name": "Lab Knowledge", "slug": "lab", ...}'
# Add access control policy for lab emails

# Public endpoint for published work
curl -X POST http://localhost:8080/api/v1/endpoints/ \
  -d '{"name": "Published Papers", "slug": "public", ...}'
# Publish to SyftHub

Integration with research tools

Syft Space provides a REST API that can integrate with:

Jupyter notebooks for interactive queries
Zotero or Mendeley for reference management
Notion or Obsidian for note-taking
Custom research dashboards

Example: Query from Jupyter

import requests

def query_research(question):
    response = requests.post(
        'http://localhost:8080/api/v1/endpoints/my-research/query',
        json={
            'messages': [{'role': 'user', 'content': question}],
            'limit': 5
        }
    )
    return response.json()

# Use in your analysis
result = query_research("What papers mention protein X?")
print(result['response'])

Learn more

Datasets

Learn about managing research materials

Models

Choose the right AI model for your needs

Endpoints

Create and configure query endpoints

API reference

Build custom integrations

Questions? Join our community or check the installation guide.

Community

Use Cases

Why Syft Space for researchers

Private by default

Instant recall

Collaborate safely

Preserve context

Use cases

Personal research assistant

Lab knowledge base

Literature review assistant

Dataset querying

Getting started

Best practices

Organizing research materials

Keep it private

Share with collaborators

Public knowledge

Rate limit queries

Query optimization

Example: PhD researcher

Example: Research lab

Advanced features

Multiple endpoints for different audiences

Integration with research tools

Learn more

Datasets

Models

Endpoints

API reference

Build docs developers (and LLMs) love

Community

Use Cases

​Why Syft Space for researchers

Private by default

Instant recall

Collaborate safely

Preserve context

​Use cases

​Personal research assistant

​Lab knowledge base

​Literature review assistant

​Dataset querying

​Getting started

​Best practices

​Organizing research materials

​Privacy and sharing

Keep it private

Share with collaborators

Public knowledge

Rate limit queries

​Query optimization

​Example: PhD researcher

​Example: Research lab

​Advanced features

​Multiple endpoints for different audiences

​Integration with research tools

​Learn more

Datasets

Models

Endpoints

API reference

Build docs developers (and LLMs) love

Why Syft Space for researchers

Use cases

Personal research assistant

Lab knowledge base

Literature review assistant

Dataset querying

Getting started

Best practices

Organizing research materials

Privacy and sharing

Query optimization

Example: PhD researcher

Example: Research lab

Advanced features

Multiple endpoints for different audiences

Integration with research tools

Learn more