Skip to main content
Groq provides blazingly fast inference for large language models, powering MedMitra’s AI-driven medical analysis and vision capabilities.

Overview

MedMitra uses Groq for:
  • Medical Insights Generation: Case summaries, SOAP notes, and diagnoses
  • Radiology Image Analysis: Vision AI using LLaVA model
  • Structured Data Extraction: JSON-formatted medical information
  • Real-time Processing: Fast inference for responsive user experience

Why Groq?

  • Speed: 10-100x faster than traditional GPU inference
  • Cost-effective: Competitive pricing per token
  • Multiple Models: Access to Llama 3, LLaVA, and other open models
  • Reliability: High uptime and consistent performance

Prerequisites

  • A Groq account (sign up at console.groq.com)
  • Python 3.9+ for backend integration

Setup Instructions

1. Get a Groq API Key

  1. Visit console.groq.com
  2. Sign up or log in to your account
  3. Navigate to API Keys section
  4. Click Create API Key
  5. Copy your API key (keep it secure!)
Groq API keys have access to your account. Never commit them to version control or share them publicly.

2. Configure Environment Variables

Add to backend/.env:
GROQ_API_KEY="gsk_your_groq_api_key_here"

3. Install Groq SDK

The Groq SDK is already included in the project dependencies:
pip install groq
pip install langchain-groq

Models Used in MedMitra

Llama 3.3 70B Versatile

Purpose: Medical insights generation, SOAP notes, diagnosis
from langchain_groq import ChatGroq

llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",
    temperature=0.7
)
Characteristics:
  • 70 billion parameters
  • Excellent at structured output
  • Strong medical knowledge
  • Fast inference on Groq

LLaVA Vision Model

Purpose: Radiology image analysis (X-rays, MRIs, CT scans)
from groq import Groq

client = Groq(api_key=GROQ_API_KEY)

completion = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Analyze this radiology image"},
                {"type": "image_url", "image_url": {"url": image_url}}
            ]
        }
    ]
)
Characteristics:
  • Multimodal (text + images)
  • Specialized for vision tasks
  • Can identify medical findings
  • Returns structured JSON

Implementation Examples

LLM Manager

Location: backend/utils/llm_utils.py The LLMManager class provides a reusable interface for Groq LLM operations:
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate

class LLMManager:
    def __init__(self, model_name="llama-3.3-70b-versatile", temperature=0.7):
        self.llm = ChatGroq(model_name=model_name, temperature=temperature)

    async def generate_response(
        self, 
        system_prompt: str, 
        user_input: str, 
        prompt_variables: dict = None
    ) -> dict:
        """Generate response with optional prompt variable substitution"""
        
        if prompt_variables:
            formatted_system_prompt = system_prompt.format(**prompt_variables)
        else:
            formatted_system_prompt = system_prompt
            
        prompt = ChatPromptTemplate.from_messages([
            ("system", formatted_system_prompt),
            ("user", "{input}")
        ])
        
        chain = prompt | self.llm
        result = chain.invoke({"input": user_input})
        
        # Extract JSON from response
        return extract_json_from_string(result.content)

Vision Agent

Location: backend/agents/vision_agent.py The Vision Agent analyzes radiology images using Groq’s multimodal capabilities:
from groq import Groq
from config import GROQ_API_KEY

client = Groq(api_key=GROQ_API_KEY)

async def image_extraction(image_url: str):
    """Analyze a radiology image using Groq's vision model"""
    
    completion = client.chat.completions.create(
        model="meta-llama/llama-4-scout-17b-16e-instruct",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": RADIOLOGY_ANALYSIS_PROMPT
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": image_url}
                    }
                ]
            }
        ],
        temperature=1,
        max_completion_tokens=1024,
        top_p=1,
        stream=False
    )

    # Extract structured JSON from response
    result = extract_json_from_string(completion.choices[0].message.content)
    return result

Medical Insights Agent

The Medical Insights Agent uses LangGraph with Groq to generate comprehensive medical analysis:
from langchain_groq import ChatGroq
from langgraph.graph import StateGraph

class MedicalInsightsAgent:
    def __init__(self):
        self.llm = ChatGroq(
            model_name="llama-3.3-70b-versatile",
            temperature=0.7
        )
        self.graph = self._build_graph()
    
    def _build_graph(self):
        workflow = StateGraph(MedicalState)
        
        workflow.add_node("aggregate_data", self.aggregate_data)
        workflow.add_node("generate_summary", self.generate_summary)
        workflow.add_node("generate_soap", self.generate_soap)
        workflow.add_node("generate_diagnosis", self.generate_diagnosis)
        workflow.add_node("save_results", self.save_results)
        
        # Define edges
        workflow.add_edge("aggregate_data", "generate_summary")
        workflow.add_edge("generate_summary", "generate_soap")
        workflow.add_edge("generate_soap", "generate_diagnosis")
        workflow.add_edge("generate_diagnosis", "save_results")
        
        return workflow.compile()

Prompt Engineering

MedMitra uses specialized prompts for medical analysis:

Radiology Analysis Prompt

RADIOLOGY_ANALYSIS_PROMPT = """
You are an expert radiologist AI assistant. Analyze the provided medical image 
and extract key findings.

Provide your analysis in the following JSON format:
{
    "findings": ["finding 1", "finding 2", ...],
    "impression": "Brief impression of the image",
    "summary": "Detailed summary of observations",
    "abnormalities": ["abnormality 1", "abnormality 2", ...],
    "confidence": 0.95
}

Focus on:
- Anatomical structures visible
- Any abnormalities or pathologies
- Image quality and clarity
- Clinical relevance of findings
"""

Case Summary Prompt

CASE_SUMMARY_PROMPT = """
You are a medical AI assistant helping doctors synthesize patient information.

Given:
- Doctor's notes: {doctor_notes}
- Lab report data: {lab_data}
- Radiology findings: {radiology_data}

Generate a comprehensive case summary including:
1. Patient context (age, gender, presenting complaints)
2. Key findings from all sources
3. Relevant medical history
4. Summary of diagnostic results

Return as JSON:
{
    "comprehensive_summary": "...",
    "key_findings": [...],
    "patient_context": {...},
    "confidence_score": 0.92
}
"""

Configuration Options

Temperature Settings

# Creative/varied output (case summaries)
llm = ChatGroq(model_name="llama-3.3-70b-versatile", temperature=0.7)

# Focused/deterministic output (diagnoses)
llm = ChatGroq(model_name="llama-3.3-70b-versatile", temperature=0.3)

# Highly creative (not recommended for medical use)
llm = ChatGroq(model_name="llama-3.3-70b-versatile", temperature=1.0)

Token Limits

completion = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    max_completion_tokens=2048,  # Limit response length
    top_p=1,
    stream=False
)

Streaming Responses

# For real-time UI updates
stream = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Best Practices

  • Be specific about output format (JSON preferred)
  • Include examples in prompts for better results
  • Use system prompts to set context and constraints
  • Test prompts with various inputs
  • Wrap API calls in try-catch blocks
  • Handle rate limits gracefully (429 errors)
  • Implement retry logic with exponential backoff
  • Validate JSON responses before using
  • Monitor token usage in Groq console
  • Cache responses when appropriate
  • Use appropriate max_tokens limits
  • Batch similar requests when possible
  • Always review AI-generated diagnoses
  • Include confidence scores in outputs
  • Log all AI interactions for audit trail
  • Never use AI as sole diagnostic tool

Rate Limits & Quotas

Groq has the following limits (as of 2024):
  • Free Tier: 30 requests/minute, 14,400 requests/day
  • Paid Tier: Higher limits based on plan
  • Token Limits: Varies by model
Monitor your usage in the Groq console to avoid hitting rate limits. Implement exponential backoff for production deployments.

Troubleshooting

Error: Invalid API key
  • Verify GROQ_API_KEY in .env file
  • Check key hasn’t been revoked
  • Ensure no extra spaces in key
  • Regenerate key if necessary
Error: 429 Too Many Requests
  • Implement exponential backoff
  • Add delays between requests
  • Upgrade to paid tier for higher limits
  • Cache responses to reduce API calls
Error: Image analysis failed
  • Ensure image URL is publicly accessible
  • Check image format is supported (JPG, PNG)
  • Verify image size is within limits
  • Check internet connectivity
Error: Invalid JSON in response
  • Use extract_json_from_string utility
  • Adjust prompt to emphasize JSON format
  • Lower temperature for more structured output
  • Add validation in prompt

Performance Metrics

Typical Groq performance for MedMitra:
  • Case Summary Generation: 1-2 seconds
  • SOAP Note Generation: 1-3 seconds
  • Diagnosis Generation: 2-4 seconds
  • Radiology Image Analysis: 2-5 seconds
Groq’s LPU (Language Processing Unit) technology provides significantly faster inference than traditional GPU-based solutions.

Next Steps

LlamaParse Integration

Set up PDF document parsing

Medical Prompts

Explore medical prompt templates

Build docs developers (and LLMs) love