POST /api/suggestions
Generate 2-3 contextual follow-up questions based on the current conversation with a historical figure. This endpoint helps users continue engaging conversations by suggesting relevant questions.Request
The unique identifier of the figure in the conversation
Array of previous conversation messages for context (last 4 messages are used)
The most recent response from the historical figure
Optional AI model override (uses fast model by default: “meta-llama/llama-3.3-70b-instruct:free”)
Example Request
Response
Array of 2-3 suggested follow-up questions (strings, max 50 characters each)
Example Response
Error Responses
Missing Figure ID
400
Figure Not Found
404
Fallback Behavior
If the AI fails to generate suggestions or returns invalid data, the API returns generic fallback suggestions:Implementation Details
Optimization
The suggestions endpoint is optimized for speed:- Uses a dedicated fast function
call_llm_suggestions()instead of the fullcall_llm()with retry logic - Default timeout: 10 seconds (vs 30 seconds for regular chat)
- Uses fast, efficient models by default (Llama 3.3 70B)
- Max tokens limited to 150 (vs 500 for chat)
- Temperature set to 0.7 for balanced creativity
Context Handling
- History limit: Only the last 4 messages from history are used for context
- Response truncation: Last response is truncated to 500 characters to fit in the prompt
- Conversation context: The prompt includes both the user and assistant messages to understand the flow
Response Parsing
The API extracts questions from the AI response:- Splits response by newlines
- Strips numbering/bullets (
1.,-,•,*) - Filters out very short lines (< 10 characters)
- Returns the first 3 valid suggestions
- Falls back to generic suggestions if parsing fails
Prompt Format
The AI receives a system prompt and user prompt: System Prompt:Usage Tips
When to Call
- After receiving a response from the figure
- To help users who are unsure what to ask next
- To suggest deeper or related topics
Display
- Show as clickable buttons or chips in your UI
- Allow users to click to auto-fill the message input
- Optionally hide after the user sends their own message
Performance
- Call this endpoint in parallel with displaying the figure’s response
- Cache suggestions briefly if the conversation continues rapidly
- Don’t block the UI waiting for suggestions - show them when ready
Example Implementation
Related Endpoints
- POST /api/chat - Send messages and receive responses
- POST /api/dinner-party/suggestions - Get suggestions for multi-guest conversations