Vision
nanobot aims to be the simplest, most hackable AI agent framework while maintaining full functionality. We’re not trying to be the biggest or most feature-rich — we’re optimizing for clarity, simplicity, and research-readiness.Design Goals
- Keep it tiny: Target ~5,000 core agent lines (currently ~4,000)
- Stay readable: Every line should be understandable
- Make it hackable: Easy to modify and extend
- Remain practical: Real features, not toy examples
Current Status (v0.1.4.post3)
✅ What’s Working
- 10 chat channels: Telegram, Discord, WhatsApp, Feishu, Email, Slack, QQ, DingTalk, Matrix, Mochat
- 15+ LLM providers: OpenRouter, Anthropic, OpenAI, DeepSeek, Gemini, Groq, and more
- 8 built-in tools: Shell, filesystem, web, spawn, cron, message, MCP
- MCP support: Model Context Protocol integration
- Multi-modal: Images, voice transcription (Groq Whisper)
- Memory system: Persistent MEMORY.md
- Subagents: Background task spawning
- Scheduled tasks: Cron-based scheduling + heartbeat
- Session isolation: Per-user/thread conversations
- Prompt caching: Anthropic/OpenRouter support
- OAuth providers: OpenAI Codex, GitHub Copilot
- Thinking mode: Experimental reasoning support
🔧 Current Limitations
- Manual testing only: No automated test suite
- Basic memory: Simple markdown, no vector search
- Limited multimodal: Images receive-only (most channels)
- No streaming: Responses sent after completion
- Simple context: No advanced retrieval
Roadmap
Phase 1: Enhanced Multi-Modal (Q2 2026)
Goal: See, hear, and create media. Features:- Vision: Image understanding (GPT-4V, Claude 3)
- Image generation: DALL-E, Stable Diffusion integration
- Video support: Receive and analyze videos
- Voice output: Text-to-speech responses
- Audio analysis: Analyze audio files beyond transcription
- Telegram (send images)
- Discord (send images)
- WhatsApp (send images)
- All channels (receive images for vision)
generate_image(prompt: str) -> image_pathanalyze_image(image_path: str) -> descriptionspeak(text: str) -> audio_path
Phase 2: Long-Term Memory (Q3 2026)
Goal: Never forget important context. Features:- Vector search: Semantic memory retrieval
- Automatic summarization: Compress old conversations
- Entity tracking: Remember people, places, facts
- Memory importance scoring: Prioritize key information
- Multi-document memory: Organize by topic/project
- Use lightweight vector DB (ChromaDB, DuckDB)
- Auto-summarize conversations > N messages
- Extract entities with LLM calls
- Store in
~/.nanobot/memory/with indexes
remember(fact: str, importance: int)recall(query: str) -> relevant_factsforget(fact_id: str)
Phase 3: Better Reasoning (Q4 2026)
Goal: Multi-step planning and self-reflection. Features:- Chain-of-thought: Explicit reasoning steps
- Task decomposition: Break complex tasks into subtasks
- Self-critique: Evaluate and revise outputs
- Plan visualization: Show reasoning tree to user
- Alternative exploration: Consider multiple approaches
- Add
reason()tool for internal thinking - Multi-pass agent loop (plan → execute → reflect)
- Reasoning prompt templates
- Visualization in web UI (future)
plan(goal: str) -> steps[]critique(output: str) -> improvements[]reflect() -> insights
Phase 4: More Integrations (Ongoing)
Goal: Work with more platforms and tools. Channels:- Twitter/X: Post tweets, reply to mentions
- LinkedIn: Messaging integration
- SMS: Twilio integration
- Mastodon: Fediverse support
- iMessage: Apple Messages (via bridge)
- Signal: Private messaging
- Zulip: Team chat
- Mattermost: Open-source Slack alternative
- Together AI: Fast inference
- Fireworks: Model zoo
- Replicate: Run any model
- Cohere: Command models
- AI21: Jurassic models
- Mistral: Mistral AI (if not via OpenRouter)
- Calendar: Google Calendar, Outlook integration
- Email send: Proactive email sending
- File sync: Dropbox, Google Drive
- Database: SQL query execution
- Code execution: Jupyter kernels
- Browser: Playwright/Selenium automation
Phase 5: Self-Improvement (2027)
Goal: Learn from feedback and mistakes. Features:- User feedback loop: Rate responses, agent learns
- Error tracking: Log and analyze failures
- Automatic retries: Fix mistakes without user intervention
- Preference learning: Adapt to user style
- Skill discovery: Auto-install useful skills
- Feedback storage in
~/.nanobot/feedback/ - Error pattern detection
- Preference profiles in config
- Skill marketplace integration (ClawHub)
rate_response(rating: int, feedback: str)analyze_errors() -> patterns[]adjust_preferences(key: str, value: any)
Community Priorities
Based on GitHub discussions and Discord feedback:High Demand
- Web UI: Browser-based interface (like OpenWebUI)
- Streaming responses: Real-time output
- Function calling improvements: Parallel tool execution
- Better error messages: More helpful diagnostics
- RAG support: Document Q&A
Medium Demand
- Plugin system: Third-party tool installation
- Multi-agent coordination: Agents working together
- Custom prompts: User-defined system prompts
- Voice UI: Speak to agent directly
- Mobile app: iOS/Android companion
Low Demand (but interesting)
- Agent marketplace: Share and download agents
- Blockchain integration: Web3 tools
- IoT control: Smart home integration
- AR/VR: Spatial computing
Non-Goals
What we’re not building (to keep nanobot simple): ❌ Enterprise features: SSO, multi-tenancy, admin panels❌ Complex UIs: Rich web dashboards (keep it CLI-first)
❌ Heavy dependencies: Avoid large frameworks (Django, etc.)
❌ Monolithic architecture: Stay modular and hackable
❌ Kitchen sink: Don’t add every possible feature If you need these, consider building on top of nanobot or using a different framework.
How to Contribute
Want to help with the roadmap?- Pick an item from the roadmap above
- Open a GitHub Discussion to discuss your approach
- Create a PR with your implementation
- Get feedback from maintainers
- Iterate until it’s ready to merge
Versioning Strategy
Current: v0.1.x (Alpha)
- Rapid iteration
- Breaking changes allowed
- Focus on core features
Future: v0.2.x (Beta)
- Stable API
- Deprecation warnings before breaking changes
- Focus on polish and reliability
Long-term: v1.0.0 (Stable)
- Production-ready
- Semantic versioning
- Long-term support
Release Cadence
- Patch releases (v0.1.4.post1): As needed (bug fixes)
- Minor releases (v0.1.5): Every 1-2 weeks (new features)
- Major releases (v0.2.0): When API changes significantly
Feature Requests
Have an idea? Here’s how to suggest it:- Check existing issues/discussions: Might already be planned
- Open a GitHub Discussion: Describe the feature and use case
- Gauge community interest: See if others want it too
- Estimate complexity: How many lines of code?
- Propose implementation: How would it fit into nanobot?
- Align with nanobot’s goals (simple, hackable)
- Have clear use cases
- Don’t add excessive complexity
- Can be implemented in less than 500 lines
- “Add everything from framework X”
- Niche features used by less than 1% of users
- Require heavy dependencies
- Violate the “keep it simple” principle
Research Areas
For academic/research use:- Memory architectures: Better long-term memory designs
- Multi-agent systems: Agent communication protocols
- Tool learning: Automatic tool discovery and composition
- Context optimization: Smarter prompt compression
- Reasoning methods: Novel planning and reflection techniques
Metrics
How we measure success:- Lines of code: Keep core agent under 5,000 lines
- Startup time: CLI mode under 1 second
- Dependencies: Minimize third-party packages
- Documentation: Every feature documented
- Community: Active Discord, GitHub discussions
- Real usage: People actually use it daily
Timeline
| Quarter | Focus | Key Features |
|---|---|---|
| Q2 2026 | Multi-modal | Vision, image generation, voice output |
| Q3 2026 | Memory | Vector search, summarization, entity tracking |
| Q4 2026 | Reasoning | Chain-of-thought, task decomposition |
| Q1 2027 | Integrations | New channels, providers, tools |
| Q2 2027 | Self-improvement | Feedback loops, error learning |
| Q3 2027 | Polish | Web UI, streaming, better UX |
| Q4 2027 | v1.0 | Production-ready release |
Long-Term Vision (2028+)
- Autonomous agents: Proactively help without prompting
- Agent collaboration: Multiple agents working together
- Continuous learning: Improve over time from usage
- Universal interface: Control anything via natural language
- Personal AI OS: nanobot as your digital assistant layer
Get Involved
- Discord: Join the community
- GitHub: HKUDS/nanobot
- Discussions: Share ideas
- Issues: Report bugs