Skip to main content

Overview

The arXiv Researcher Agent is an advanced AI research assistant that combines OpenAI Agents with Memori memory integration. It searches for academic papers on arXiv, generates comprehensive research reports, and maintains persistent memory of all research sessions for future reference.

Key Features

  • arXiv Paper Search: Uses Tavily to find relevant research papers and academic sources
  • Persistent Memory: All research sessions stored in Memori for future reference
  • Memory Search: Query and build upon previous research findings
  • Dual-Agent System: Separate agents for research and memory retrieval
  • Streamlit Interface: Interactive web app for research and memory queries

Architecture

Memory Integration with Memori

from memori import Memori, create_memory_tool

# Initialize Memori memory system
memori = Memori(
    database_connect="sqlite:///research_memori.db",
    conscious_ingest=True,  # Working memory - captures important info
    auto_ingest=True,       # Dynamic search - automatic indexing
    verbose=True,
)
memori.enable()
memory_tool = create_memory_tool(memori)
Memori provides two key features:
  • Conscious Ingest: Actively captures and stores important information during conversations
  • Auto Ingest: Automatically indexes content for dynamic retrieval

Agent System

The system uses two specialized agents:
  1. Research Agent: Searches arXiv and generates reports
  2. Memory Agent: Retrieves and organizes past research

Implementation

Research Agent with Memory Tools

from agents import Agent, function_tool, OpenAIChatCompletionsModel
from memori import Memori, create_memory_tool
from tavily import TavilyClient
import os

# Initialize Tavily for arXiv search
tavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))

# Create memory search tool
@function_tool
def search_memory(query: str) -> MemorySearchResult:
    """Search the agent's memory for past research information."""
    result = memory_tool.execute(query=query.strip())
    found_memories = bool(
        result
        and "No relevant memories found" not in result
        and "Error" not in result
    )
    return MemorySearchResult(
        query=query,
        results=result if result else "No relevant memories found",
        found_memories=found_memories
    )

# Create arXiv search tool
@function_tool
def search_arxiv(query: str) -> ArxivSearchResult:
    """Search for research papers on arXiv."""
    search_query = f"arXiv research papers {query} latest developments academic research"
    
    # Perform Tavily search with academic domains
    search_result = tavily_client.search(
        query=search_query,
        search_depth="advanced",
        include_domains=["arxiv.org", "scholar.google.com", "researchgate.net"],
        max_results=10
    )
    
    # Process and format results
    papers = []
    for result in search_result["results"][:5]:
        papers.append({
            "title": result.get("title", "No title"),
            "url": result.get("url", ""),
            "summary": result.get("content", "")[:200] + "..."
        })
    
    return ArxivSearchResult(
        query=query,
        results=format_papers(papers),
        found_papers=len(papers) > 0
    )

Creating the Research Agent

from textwrap import dedent

research_agent = Agent(
    name="ArXiv Research Agent",
    model=model,
    instructions=dedent(
        """
        You are Professor X-1000, a distinguished AI research scientist with MEMORY CAPABILITIES!

        🧠 Your enhanced abilities:
        - Advanced research using arXiv paper search via Tavily
        - Persistent memory of all research sessions
        - Ability to reference and build upon previous research
        - Creating comprehensive, fact-based research reports

        RESEARCH WORKFLOW:
        1. FIRST: Use search_memory to find any related previous research
        2. Use search_arxiv to find relevant research papers
        3. Analyze and cross-reference sources
        4. If you find relevant previous research, mention how this builds upon it
        5. Structure your report following academic standards
        6. Include only verifiable facts with proper citations
        7. End with actionable takeaways and future implications

        Always mention if you're building upon previous research sessions!
        """
    ),
    tools=[search_memory, search_arxiv],
)

Memory Assistant Agent

memory_agent = Agent(
    name="Research Memory Assistant",
    instructions=dedent(
        """
        You are the Research Memory Assistant, specialized in helping users recall their research history!

        🧠 Your capabilities:
        - Search through all past research sessions
        - Summarize previous research topics and findings
        - Help users find specific research they've done before
        - Connect related research across different sessions

        When users ask about their research history:
        1. Use search_memory to find relevant past research
        2. Organize the results chronologically or by topic
        3. Provide clear summaries of each research session
        4. Highlight key findings and connections between research
        """
    ),
    tools=[search_memory],
)

Running Agents with Memory Recording

class Researcher:
    def __init__(self):
        self.memori = Memori(
            database_connect="sqlite:///research_memori.db",
            conscious_ingest=True,
            auto_ingest=True,
            verbose=True,
        )
        self.memori.enable()
        self.memory_tool = create_memory_tool(self.memori)
    
    async def run_agent_with_memory(self, agent, user_input: str):
        """Run agent and record the conversation in memory."""
        # Run the agent
        result = await Runner.run(agent, input=user_input)
        
        # Get the response
        response_content = (
            result.final_output if hasattr(result, "final_output") else str(result)
        )
        
        # Store in Memori
        self.memori.record_conversation(
            user_input=user_input,
            ai_output=response_content
        )
        
        return result

Streamlit Interface

import streamlit as st
import asyncio

st.title("arXiv Research Agent with Memory")

# Sidebar: Chat mode selection
with st.sidebar:
    chat_mode = st.radio(
        "Select Chat Mode",
        ["Research Chat", "Memory Chat"]
    )

user_input = st.chat_input("Ask me to research a topic or query your research history")

if user_input:
    with st.spinner("Thinking..."):
        researcher = Researcher()
        
        # Choose agent based on mode
        if chat_mode == "Research Chat":
            result = asyncio.run(
                researcher.run_agent_with_memory(research_agent, user_input)
            )
        else:
            result = asyncio.run(
                researcher.run_agent_with_memory(memory_agent, user_input)
            )
        
        st.markdown(result.final_output)

Memory Storage

Memori stores research conversations in a SQLite database:
# Each conversation is stored with:
{
    "user_input": "Research quantum computing",
    "ai_output": "## arXiv Research Papers for: quantum computing\n\n...",
    "timestamp": "2024-03-05T10:30:00",
    "metadata": {
        "topic": "quantum computing",
        "papers_found": 5
    }
}

Example Research Session

# Example 1: New Research
User: "Research the latest breakthroughs in quantum computing."

Agent:
1. Searches memory for previous quantum computing research
2. Uses Tavily to find arXiv papers
3. Generates comprehensive report
4. Saves session to Memori

# Example 2: Memory Query
User: "Summarize my research history on AI ethics."

Memory Agent:
- Searches all past research sessions
- Organizes findings by topic
- Provides clear summary with key connections

Configuration

Environment Variables

Create a .env file:
NEBIUS_API_KEY=your_nebius_api_key
TAVILY_API_KEY=your_tavily_api_key
EXAMPLE_MODEL_NAME=moonshotai/Kimi-K2-Instruct
EXAMPLE_BASE_URL=https://api.studio.nebius.ai/v1

Memori Configuration Options

database_connect
string
required
SQLite database connection string (e.g., sqlite:///research_memori.db)
conscious_ingest
bool
default:"false"
Enables working memory - actively captures important information
auto_ingest
bool
default:"false"
Enables dynamic search - automatically indexes content for retrieval
verbose
bool
default:"false"
Enables detailed logging of memory operations

Installation

git clone https://github.com/Arindam200/awesome-ai-apps.git
cd memory_agents/arxiv_researcher_agent_with_memori
uv sync

Running the Application

streamlit run app.py

Use Cases

Academic Research

Track research topics over time and build comprehensive knowledge bases

Literature Review

Organize and recall previous literature reviews for new projects

Research Collaboration

Share research memory across team members for collaborative projects

Knowledge Management

Build institutional memory of research activities and findings

Best Practices

1

Search Memory First

Always search existing research before conducting new searches to avoid duplication
2

Use Descriptive Queries

Provide specific, descriptive research queries for better arXiv search results
3

Organize by Topic

Use consistent topic naming for easier memory retrieval
4

Review Memory Regularly

Periodically query your research history to identify patterns and gaps

Memori Documentation

Official Memori memory system documentation

Tavily API

Tavily search API for academic research

Build docs developers (and LLMs) love