While Unmute doesn’t currently have built-in tool calling support, you can integrate function calling capabilities by wrapping the LLM server with a tool-calling layer.
Overview
Tool calling allows the LLM to invoke external functions, such as:
Weather API lookups
Database queries
Web searches
Smart home controls
Calendar operations
Custom business logic
The key insight is that tool calling can be implemented transparently at the LLM layer, making it invisible to the Unmute backend.
Architecture
The tool-calling proxy:
Receives text generation requests from Unmute backend
Forwards to the LLM with tool definitions
Detects when the LLM wants to call a tool
Executes the tool and injects results
Continues generation with tool results
Returns final response to Unmute
Implementation Approach
The recommended approach is to create a FastAPI server that:
Exposes an OpenAI-compatible API endpoint
Wraps your LLM server (VLLM, Ollama, etc.)
Intercepts tool calls and executes them
Streams results back to Unmute
Why This Works
Unmute expects streaming text responses from an OpenAI-compatible endpoint. As long as your proxy server provides this interface, Unmute doesn’t need to know about the tool calling happening behind the scenes.
Step-by-Step Implementation
Create a Tool-Calling Proxy Server
Create a new FastAPI application that wraps your LLM: # tool_proxy.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import httpx
import json
app = FastAPI()
# Your tool definitions
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get weather for a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : { "type" : "string" }
},
"required" : [ "location" ]
}
}
}
]
async def execute_tool ( tool_name : str , arguments : dict ):
"""Execute a tool and return results"""
if tool_name == "get_weather" :
# Call weather API
location = arguments[ "location" ]
# ... fetch weather data ...
return { "temperature" : 72 , "condition" : "sunny" }
return None
@app.post ( "/v1/chat/completions" )
async def chat_completions ( request : dict ):
"""OpenAI-compatible endpoint with tool calling"""
# Add tools to the request
request[ "tools" ] = tools
# Forward to actual LLM (e.g., VLLM)
llm_url = "http://localhost:8000/v1/chat/completions"
async with httpx.AsyncClient() as client:
response = await client.post(llm_url, json = request)
# Check if LLM wants to call a tool
result = response.json()
if "tool_calls" in result.get( "choices" , [{}])[ 0 ].get( "message" , {}):
tool_calls = result[ "choices" ][ 0 ][ "message" ][ "tool_calls" ]
# Execute each tool call
for tool_call in tool_calls:
tool_name = tool_call[ "function" ][ "name" ]
args = json.loads(tool_call[ "function" ][ "arguments" ])
tool_result = await execute_tool(tool_name, args)
# Add tool result to conversation
request[ "messages" ].append({
"role" : "tool" ,
"tool_call_id" : tool_call[ "id" ],
"content" : json.dumps(tool_result)
})
# Make another LLM call with tool results
response = await client.post(llm_url, json = request)
return response.json()
Add Streaming Support
For real-time voice, streaming is essential: @app.post ( "/v1/chat/completions" )
async def chat_completions ( request : dict ):
# Enable streaming
request[ "stream" ] = True
async def stream_with_tools ():
# Track if we're in a tool call
in_tool_call = False
tool_buffer = ""
async with httpx.AsyncClient() as client:
async with client.stream(
"POST" ,
"http://localhost:8000/v1/chat/completions" ,
json = request
) as response:
async for chunk in response.aiter_text():
# Parse SSE format
if chunk.startswith( "data: " ):
data = chunk[ 6 :].strip()
if data == "[DONE]" :
yield chunk
continue
parsed = json.loads(data)
delta = parsed[ "choices" ][ 0 ][ "delta" ]
# Check for tool call
if "tool_calls" in delta:
in_tool_call = True
# Buffer tool call data
# ...
elif in_tool_call:
# Execute tool and restart generation
# ...
else :
# Normal text, pass through
yield chunk
return StreamingResponse(
stream_with_tools(),
media_type = "text/event-stream"
)
Configure Unmute to Use Your Proxy
Update docker-compose.yml to point to your proxy server: backend :
environment :
- KYUTAI_LLM_URL=http://tool-proxy:8080
- KYUTAI_LLM_MODEL=your-model-name
# Add your proxy service
tool-proxy :
build :
context : ./tool-proxy
ports :
- "8080:8080"
environment :
- VLLM_URL=http://llm:8000
Test Tool Calling
Start a conversation and ask the assistant to use a tool: User: "What's the weather in Paris?"
Assistant: "Let me check that for you... It's currently 72°F and sunny in Paris!"
The tool call happens transparently behind the scenes.
Weather Lookup
async def get_weather ( location : str ) -> dict :
"""Fetch weather data from an API"""
api_key = os.environ[ "WEATHER_API_KEY" ]
url = f "https://api.weather.com/v1/current?location= { location } &key= { api_key } "
async with httpx.AsyncClient() as client:
response = await client.get(url)
data = response.json()
return {
"location" : location,
"temperature" : data[ "temp" ],
"condition" : data[ "condition" ],
"humidity" : data[ "humidity" ]
}
Web Search
async def web_search ( query : str ) -> dict :
"""Search the web and return results"""
search_api_key = os.environ[ "SEARCH_API_KEY" ]
url = f "https://api.search.com/v1/search?q= { query } &key= { search_api_key } "
async with httpx.AsyncClient() as client:
response = await client.get(url)
results = response.json()
return {
"query" : query,
"results" : results[ "items" ][: 3 ] # Top 3 results
}
Database Query
async def query_database ( sql : str ) -> dict :
"""Execute a safe, read-only database query"""
# Add SQL injection protection!
if not sql.upper().startswith( "SELECT" ):
return { "error" : "Only SELECT queries allowed" }
async with asyncpg.connect( DATABASE_URL ) as conn:
rows = await conn.fetch(sql)
return {
"rows" : [ dict (row) for row in rows],
"count" : len (rows)
}
LLM Server Compatibility
Tool calling support varies by LLM server:
VLLM
Supports OpenAI-compatible function calling format
Enable with --enable-auto-tool-choice and --tool-call-parser:
llm :
command :
- "--model=meta-llama/Llama-3.2-1B-Instruct"
- "--enable-auto-tool-choice"
- "--tool-call-parser=llama3_json"
Ollama
Supports tools parameter in recent versions
response = ollama.chat(
model = 'llama3' ,
messages = messages,
tools = tools
)
OpenAI API
Native support for function calling
No proxy needed - OpenAI handles tool calls directly.
Advanced Patterns
Allow the LLM to call multiple tools in sequence:
async def handle_tool_calls ( messages , max_iterations = 5 ):
"""Handle multiple rounds of tool calling"""
for i in range (max_iterations):
response = await call_llm(messages)
if not has_tool_calls(response):
return response # Final answer
# Execute tools and add results to messages
tool_results = await execute_all_tools(response.tool_calls)
messages.extend(tool_results)
return { "error" : "Max iterations reached" }
Show different tools based on context:
def get_available_tools ( user_id : str , conversation_context : str ) -> list :
"""Return tools based on user permissions and context"""
tools = [ WEATHER_TOOL ] # Always available
if is_admin(user_id):
tools.append( DATABASE_TOOL )
if "calendar" in conversation_context.lower():
tools.append( CALENDAR_TOOL )
return tools
Error Handling
async def execute_tool_safely ( tool_name : str , arguments : dict ):
"""Execute tool with error handling"""
try :
result = await execute_tool(tool_name, arguments)
return { "success" : True , "data" : result}
except TimeoutError :
return { "success" : False , "error" : "Tool timed out" }
except Exception as e:
logger.error( f "Tool { tool_name } failed: { e } " )
return { "success" : False , "error" : str (e)}
Considerations
Latency Impact : Tool calls add latency to responses. For voice conversations, this can be noticeable. Consider:
Using fast APIs
Caching tool results
Setting reasonable timeouts
Limiting tool call depth
Security : Validate all tool inputs and outputs. Never execute arbitrary code or SQL without sanitization.
Streaming : Continue streaming text to Unmute while executing tools in the background to maintain responsiveness.
Tool calling support would make a great contribution to the Unmute project! If you build a robust tool-calling proxy server, consider:
Opening a pull request to add it to the main repository
Documenting your approach for others
Sharing example tool implementations
See the GitHub discussion for more details and community input.
Reference Implementation
For a complete working example, check out these resources:
Next Steps
External LLM Configure Unmute to use different LLM providers
Custom Frontend Build your own client using the WebSocket protocol