Skip to main content
While Unmute doesn’t currently have built-in tool calling support, you can integrate function calling capabilities by wrapping the LLM server with a tool-calling layer.

Overview

Tool calling allows the LLM to invoke external functions, such as:
  • Weather API lookups
  • Database queries
  • Web searches
  • Smart home controls
  • Calendar operations
  • Custom business logic
The key insight is that tool calling can be implemented transparently at the LLM layer, making it invisible to the Unmute backend.

Architecture

The tool-calling proxy:
  1. Receives text generation requests from Unmute backend
  2. Forwards to the LLM with tool definitions
  3. Detects when the LLM wants to call a tool
  4. Executes the tool and injects results
  5. Continues generation with tool results
  6. Returns final response to Unmute

Implementation Approach

The recommended approach is to create a FastAPI server that:
  1. Exposes an OpenAI-compatible API endpoint
  2. Wraps your LLM server (VLLM, Ollama, etc.)
  3. Intercepts tool calls and executes them
  4. Streams results back to Unmute

Why This Works

Unmute expects streaming text responses from an OpenAI-compatible endpoint. As long as your proxy server provides this interface, Unmute doesn’t need to know about the tool calling happening behind the scenes.

Step-by-Step Implementation

1

Create a Tool-Calling Proxy Server

Create a new FastAPI application that wraps your LLM:
# tool_proxy.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import httpx
import json

app = FastAPI()

# Your tool definitions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

async def execute_tool(tool_name: str, arguments: dict):
    """Execute a tool and return results"""
    if tool_name == "get_weather":
        # Call weather API
        location = arguments["location"]
        # ... fetch weather data ...
        return {"temperature": 72, "condition": "sunny"}
    return None

@app.post("/v1/chat/completions")
async def chat_completions(request: dict):
    """OpenAI-compatible endpoint with tool calling"""
    
    # Add tools to the request
    request["tools"] = tools
    
    # Forward to actual LLM (e.g., VLLM)
    llm_url = "http://localhost:8000/v1/chat/completions"
    
    async with httpx.AsyncClient() as client:
        response = await client.post(llm_url, json=request)
        
        # Check if LLM wants to call a tool
        result = response.json()
        
        if "tool_calls" in result.get("choices", [{}])[0].get("message", {}):
            tool_calls = result["choices"][0]["message"]["tool_calls"]
            
            # Execute each tool call
            for tool_call in tool_calls:
                tool_name = tool_call["function"]["name"]
                args = json.loads(tool_call["function"]["arguments"])
                tool_result = await execute_tool(tool_name, args)
                
                # Add tool result to conversation
                request["messages"].append({
                    "role": "tool",
                    "tool_call_id": tool_call["id"],
                    "content": json.dumps(tool_result)
                })
            
            # Make another LLM call with tool results
            response = await client.post(llm_url, json=request)
        
        return response.json()
2

Add Streaming Support

For real-time voice, streaming is essential:
@app.post("/v1/chat/completions")
async def chat_completions(request: dict):
    # Enable streaming
    request["stream"] = True
    
    async def stream_with_tools():
        # Track if we're in a tool call
        in_tool_call = False
        tool_buffer = ""
        
        async with httpx.AsyncClient() as client:
            async with client.stream(
                "POST",
                "http://localhost:8000/v1/chat/completions",
                json=request
            ) as response:
                async for chunk in response.aiter_text():
                    # Parse SSE format
                    if chunk.startswith("data: "):
                        data = chunk[6:].strip()
                        if data == "[DONE]":
                            yield chunk
                            continue
                        
                        parsed = json.loads(data)
                        delta = parsed["choices"][0]["delta"]
                        
                        # Check for tool call
                        if "tool_calls" in delta:
                            in_tool_call = True
                            # Buffer tool call data
                            # ...
                        elif in_tool_call:
                            # Execute tool and restart generation
                            # ...
                        else:
                            # Normal text, pass through
                            yield chunk
    
    return StreamingResponse(
        stream_with_tools(),
        media_type="text/event-stream"
    )
3

Configure Unmute to Use Your Proxy

Update docker-compose.yml to point to your proxy server:
backend:
  environment:
    - KYUTAI_LLM_URL=http://tool-proxy:8080
    - KYUTAI_LLM_MODEL=your-model-name

# Add your proxy service
tool-proxy:
  build:
    context: ./tool-proxy
  ports:
    - "8080:8080"
  environment:
    - VLLM_URL=http://llm:8000
4

Test Tool Calling

Start a conversation and ask the assistant to use a tool:
User: "What's the weather in Paris?"
Assistant: "Let me check that for you... It's currently 72°F and sunny in Paris!"
The tool call happens transparently behind the scenes.

Example Tools

Weather Lookup

async def get_weather(location: str) -> dict:
    """Fetch weather data from an API"""
    api_key = os.environ["WEATHER_API_KEY"]
    url = f"https://api.weather.com/v1/current?location={location}&key={api_key}"
    
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        data = response.json()
        
    return {
        "location": location,
        "temperature": data["temp"],
        "condition": data["condition"],
        "humidity": data["humidity"]
    }
async def web_search(query: str) -> dict:
    """Search the web and return results"""
    search_api_key = os.environ["SEARCH_API_KEY"]
    url = f"https://api.search.com/v1/search?q={query}&key={search_api_key}"
    
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        results = response.json()
    
    return {
        "query": query,
        "results": results["items"][:3]  # Top 3 results
    }

Database Query

async def query_database(sql: str) -> dict:
    """Execute a safe, read-only database query"""
    # Add SQL injection protection!
    if not sql.upper().startswith("SELECT"):
        return {"error": "Only SELECT queries allowed"}
    
    async with asyncpg.connect(DATABASE_URL) as conn:
        rows = await conn.fetch(sql)
    
    return {
        "rows": [dict(row) for row in rows],
        "count": len(rows)
    }

LLM Server Compatibility

Tool calling support varies by LLM server:

VLLM

Supports OpenAI-compatible function calling format
Enable with --enable-auto-tool-choice and --tool-call-parser:
llm:
  command:
    - "--model=meta-llama/Llama-3.2-1B-Instruct"
    - "--enable-auto-tool-choice"
    - "--tool-call-parser=llama3_json"

Ollama

Supports tools parameter in recent versions
response = ollama.chat(
    model='llama3',
    messages=messages,
    tools=tools
)

OpenAI API

Native support for function calling
No proxy needed - OpenAI handles tool calls directly.

Advanced Patterns

Multi-Step Tool Chains

Allow the LLM to call multiple tools in sequence:
async def handle_tool_calls(messages, max_iterations=5):
    """Handle multiple rounds of tool calling"""
    for i in range(max_iterations):
        response = await call_llm(messages)
        
        if not has_tool_calls(response):
            return response  # Final answer
        
        # Execute tools and add results to messages
        tool_results = await execute_all_tools(response.tool_calls)
        messages.extend(tool_results)
    
    return {"error": "Max iterations reached"}

Conditional Tool Availability

Show different tools based on context:
def get_available_tools(user_id: str, conversation_context: str) -> list:
    """Return tools based on user permissions and context"""
    tools = [WEATHER_TOOL]  # Always available
    
    if is_admin(user_id):
        tools.append(DATABASE_TOOL)
    
    if "calendar" in conversation_context.lower():
        tools.append(CALENDAR_TOOL)
    
    return tools

Error Handling

async def execute_tool_safely(tool_name: str, arguments: dict):
    """Execute tool with error handling"""
    try:
        result = await execute_tool(tool_name, arguments)
        return {"success": True, "data": result}
    except TimeoutError:
        return {"success": False, "error": "Tool timed out"}
    except Exception as e:
        logger.error(f"Tool {tool_name} failed: {e}")
        return {"success": False, "error": str(e)}

Considerations

Latency Impact: Tool calls add latency to responses. For voice conversations, this can be noticeable. Consider:
  • Using fast APIs
  • Caching tool results
  • Setting reasonable timeouts
  • Limiting tool call depth
Security: Validate all tool inputs and outputs. Never execute arbitrary code or SQL without sanitization.
Streaming: Continue streaming text to Unmute while executing tools in the background to maintain responsiveness.

Community Contributions

Tool calling support would make a great contribution to the Unmute project! If you build a robust tool-calling proxy server, consider:
  1. Opening a pull request to add it to the main repository
  2. Documenting your approach for others
  3. Sharing example tool implementations
See the GitHub discussion for more details and community input.

Reference Implementation

For a complete working example, check out these resources:

Next Steps

External LLM

Configure Unmute to use different LLM providers

Custom Frontend

Build your own client using the WebSocket protocol

Build docs developers (and LLMs) love