Input Formats

ClinicalPilot accepts clinical data in four flexible input formats, all automatically parsed into a unified schema for agent processing.

Supported Formats

FHIR R4 JSON

HL7 FHIR R4 Bundles for standardized EHR integration

EHR Files

Upload PDF or CSV documents from electronic health records

Free Text

Natural language clinical notes and case descriptions

Voice Transcription

Speech-to-text clinical dictation (future: Whisper STT)

FHIR R4 JSON

Submit HL7 FHIR R4 Bundles via the /api/upload/fhir endpoint. The parser extracts:

Patient → Demographics (age, gender)
Condition → Diagnoses & medical history
MedicationRequest / MedicationStatement → Current medications
Observation → Vitals & lab results
AllergyIntolerance → Drug allergies

curl -X POST https://api.clinicalpilot.ai/api/upload/fhir \
  -H "Content-Type: application/json" \
  -d @patient_bundle.json

FHIR Parser Implementation

The FHIR parser (backend/input_layer/fhir_parser.py) uses code-based extraction:

# Vital codes (LOINC)
VITAL_CODES = {
    "8310-5": "body_temperature",
    "8867-4": "heart_rate",
    "9279-1": "respiratory_rate",
    "85354-9": "blood_pressure",
    "2708-6": "spo2",
}

# Parser extracts observations and routes to vitals vs labs
if code in VITAL_CODES:
    ctx.vitals.append(Vital(name=VITAL_CODES[code], value=value_str, unit=unit))
else:
    ctx.labs.append(LabResult(name=display, value=value_str, unit=unit))

FHIR Compliance: Supports FHIR R4 only. The parser is lightweight and extracts only clinically relevant resources. Custom extensions are preserved in raw_text.

EHR Files (PDF, CSV)

Upload PDF or CSV documents from your EHR system via /api/upload/ehr.

Upload Document

Send the file as a multipart/form-data request.

curl -X POST https://api.clinicalpilot.ai/api/upload/ehr \
  -F "file=@patient_chart.pdf"

Parser Extracts Text

PDF: Uses PyPDF2 (fallback: Unstructured.io)
CSV: Column-based extraction (auto-detects headers)

backend/input_layer/ehr_parser.py:88

# PDF extraction
from PyPDF2 import PdfReader

reader = PdfReader(io.BytesIO(file_bytes))
pages = [page.extract_text() for page in reader.pages]
text = "\n\n".join(pages)

Entity Extraction

Regex-based extraction identifies:

Medications (drug name + dosage patterns)
Conditions (ICD codes, common abbreviations like HTN, DM2)
Vitals (BP, HR, temp, SpO2)
Labs (troponin, WBC, creatinine, etc.)

backend/input_layer/text_parser.py:60

# Medication regex pattern
med_patterns = [
    r"(\b[a-zA-Z]{4,20})\s+(\d+\s*(?:mg|mcg|g|ml|units?))\s*((?:BID|TID|QID|daily)?)",
]

Returns PatientContext

All parsers output the same unified schema:

{
  "patient_context": {
    "age": 68,
    "gender": "male",
    "conditions": [{"display": "Hypertension"}],
    "medications": [{"name": "Lisinopril", "dose": "20mg", "frequency": "daily"}],
    "labs": [{"name": "Creatinine", "value": "1.8", "unit": "mg/dL"}]
  },
  "summary": "68-year-old male patient\nPMH: Hypertension\nMedications: Lisinopril 20mg daily..."
}

CSV Format Requirements

The CSV parser auto-detects common column names:

Column Name	Maps To	Example Value
`age`	Demographics	`68`
`gender` or `sex`	Demographics	`male`
`diagnosis` or `condition`	Conditions	`Type 2 Diabetes Mellitus`
`medication` or `drug`	Medications	`Metformin`
`dose` or `dosage`	Medication dose	`1000mg`
`lab` or `test`	Lab results	`HbA1c`
`value` or `result`	Lab value	`7.2`
`unit`	Lab unit	`%`

Free Text Input

Submit natural language clinical notes directly. The text parser uses regex + heuristics to extract structured data.

cURL
Python
JavaScript

curl -X POST https://api.clinicalpilot.ai/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "text": "68 yo male with HTN, DM2. Medications: Lisinopril 20mg daily, Metformin 1000mg BID. BP 145/92, HR 88, glucose 180 mg/dL. Presenting with dizziness and fatigue for 3 days."
  }'

import requests

response = requests.post(
    "https://api.clinicalpilot.ai/api/analyze",
    json={
        "text": "68 yo male with HTN, DM2. Medications: Lisinopril 20mg daily, Metformin 1000mg BID. BP 145/92, HR 88, glucose 180 mg/dL. Presenting with dizziness and fatigue for 3 days."
    }
)

soap = response.json()["soap"]
print(soap["assessment"])

const response = await fetch("https://api.clinicalpilot.ai/api/analyze", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    text: "68 yo male with HTN, DM2. Medications: Lisinopril 20mg daily, Metformin 1000mg BID. BP 145/92, HR 88, glucose 180 mg/dL. Presenting with dizziness and fatigue for 3 days."
  })
});

const { soap } = await response.json();
console.log(soap.assessment);

Extraction Rules

The text parser recognizes:

Demographics

Age: 68 yo, 68-year-old, 68 y.o.
Gender: Keywords like male, female, man, woman, he, she

Medications

Pattern: drug_name dose frequency
Examples: Lisinopril 20mg daily, Metformin 1000mg BID, Aspirin 81mg PRN
Section headers: “Medications:”, “Current Meds:”

Conditions

Common abbreviations: HTN → Hypertension, DM2 → Type 2 Diabetes, CAD → Coronary Artery Disease
History patterns: “history of”, “h/o”, “diagnosed with”

Vitals

BP: BP 145/92, BP: 120/80 mmHg
HR: HR 88, heart rate 72 bpm
Temp: temp 98.6°F, temperature 37.2°C
SpO2: SpO2 95%, O2 sat 98%

Labs

Common labs: troponin, WBC, Hgb, creatinine, BUN, eGFR, glucose, HbA1c, Na, K, D-dimer, BNP, lactate
Pattern: lab_name: value unit (e.g., glucose: 180 mg/dL)

Allergies

Keywords: “allergic to”, “allergy:”, “allergies:”
NKDA detection: “NKDA”, “no known drug allergies”

PHI Anonymization: All text inputs are automatically scrubbed by Microsoft Presidio before parsing. See Safety System for details.

Voice Input

Voice transcription uses the same text parser as free text input.

Future Feature: Whisper STT integration is planned. Currently, you must transcribe audio separately and submit as text.

Workflow

Transcribe Audio

Use Whisper API or your EHR’s built-in dictation:

import openai

with open("clinical_note.mp3", "rb") as audio:
    transcript = openai.Audio.transcribe("whisper-1", audio)

Submit Transcript

Send the transcribed text to /api/analyze:

response = requests.post(
    "https://api.clinicalpilot.ai/api/analyze",
    json={"text": transcript["text"]}
)

Unified PatientContext Schema

All parsers output this Pydantic model (backend/models/patient.py):

class PatientContext(BaseModel):
    # Demographics
    age: Optional[int] = None
    gender: Gender = Gender.UNKNOWN
    weight_kg: Optional[float] = None
    height_cm: Optional[float] = None

    # Clinical data
    conditions: list[Condition] = []
    medications: list[Medication] = []
    labs: list[LabResult] = []
    vitals: list[Vital] = []
    allergies: list[Allergy] = []

    # Prompts & raw text
    current_prompt: str = ""  # The clinical question
    raw_text: str = ""        # Anonymized full text

    # Metadata
    source_type: str = "text"  # "fhir", "ehr_pdf", "ehr_csv", "text"
    timestamp: str = "2026-03-03T10:15:00Z"

This schema is the single source of truth consumed by all agents (Clinical, Literature, Safety).

Best Practices

Use FHIR for Integration

If your EHR supports FHIR R4, use it for the most accurate data mapping.

Include Context in Free Text

Add patient history, vitals, and current symptoms for better analysis quality.

Check PHI Before Upload

Presidio auto-scrubs PHI, but review sensitive documents before submission.

Combine Formats

You can submit a FHIR bundle + free-text addendum by merging patient_context with text.

Get Started

Core Concepts

Guides

Supported Formats

FHIR R4 JSON

EHR Files

Free Text

Voice Transcription

FHIR R4 JSON

FHIR Parser Implementation

EHR Files (PDF, CSV)

CSV Format Requirements

Free Text Input

Extraction Rules

Voice Input

Workflow

Unified PatientContext Schema

Best Practices

Use FHIR for Integration

Include Context in Free Text

Check PHI Before Upload

Combine Formats

Next Steps

Emergency Mode

AI Chat

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Supported Formats

FHIR R4 JSON

EHR Files

Free Text

Voice Transcription

​FHIR R4 JSON

​FHIR Parser Implementation

​EHR Files (PDF, CSV)

​CSV Format Requirements

​Free Text Input

​Extraction Rules

​Voice Input

​Workflow

​Unified PatientContext Schema

​Best Practices

Use FHIR for Integration

Include Context in Free Text

Check PHI Before Upload

Combine Formats

​Next Steps

Emergency Mode

AI Chat

Build docs developers (and LLMs) love

Supported Formats

FHIR R4 JSON

FHIR Parser Implementation

EHR Files (PDF, CSV)

CSV Format Requirements

Free Text Input

Extraction Rules

Voice Input

Workflow

Unified PatientContext Schema

Best Practices

Next Steps