Skip to main content
ClinicalPilot accepts clinical data in four flexible input formats, all automatically parsed into a unified schema for agent processing.

Supported Formats

FHIR R4 JSON

HL7 FHIR R4 Bundles for standardized EHR integration

EHR Files

Upload PDF or CSV documents from electronic health records

Free Text

Natural language clinical notes and case descriptions

Voice Transcription

Speech-to-text clinical dictation (future: Whisper STT)

FHIR R4 JSON

Submit HL7 FHIR R4 Bundles via the /api/upload/fhir endpoint. The parser extracts:
  • Patient → Demographics (age, gender)
  • Condition → Diagnoses & medical history
  • MedicationRequest / MedicationStatement → Current medications
  • Observation → Vitals & lab results
  • AllergyIntolerance → Drug allergies
curl -X POST https://api.clinicalpilot.ai/api/upload/fhir \
  -H "Content-Type: application/json" \
  -d @patient_bundle.json

FHIR Parser Implementation

The FHIR parser (backend/input_layer/fhir_parser.py) uses code-based extraction:
# Vital codes (LOINC)
VITAL_CODES = {
    "8310-5": "body_temperature",
    "8867-4": "heart_rate",
    "9279-1": "respiratory_rate",
    "85354-9": "blood_pressure",
    "2708-6": "spo2",
}

# Parser extracts observations and routes to vitals vs labs
if code in VITAL_CODES:
    ctx.vitals.append(Vital(name=VITAL_CODES[code], value=value_str, unit=unit))
else:
    ctx.labs.append(LabResult(name=display, value=value_str, unit=unit))
FHIR Compliance: Supports FHIR R4 only. The parser is lightweight and extracts only clinically relevant resources. Custom extensions are preserved in raw_text.

EHR Files (PDF, CSV)

Upload PDF or CSV documents from your EHR system via /api/upload/ehr.
1

Upload Document

Send the file as a multipart/form-data request.
curl -X POST https://api.clinicalpilot.ai/api/upload/ehr \
  -F "file=@patient_chart.pdf"
2

Parser Extracts Text

  • PDF: Uses PyPDF2 (fallback: Unstructured.io)
  • CSV: Column-based extraction (auto-detects headers)
backend/input_layer/ehr_parser.py:88
# PDF extraction
from PyPDF2 import PdfReader

reader = PdfReader(io.BytesIO(file_bytes))
pages = [page.extract_text() for page in reader.pages]
text = "\n\n".join(pages)
3

Entity Extraction

Regex-based extraction identifies:
  • Medications (drug name + dosage patterns)
  • Conditions (ICD codes, common abbreviations like HTN, DM2)
  • Vitals (BP, HR, temp, SpO2)
  • Labs (troponin, WBC, creatinine, etc.)
backend/input_layer/text_parser.py:60
# Medication regex pattern
med_patterns = [
    r"(\b[a-zA-Z]{4,20})\s+(\d+\s*(?:mg|mcg|g|ml|units?))\s*((?:BID|TID|QID|daily)?)",
]
4

Returns PatientContext

All parsers output the same unified schema:
{
  "patient_context": {
    "age": 68,
    "gender": "male",
    "conditions": [{"display": "Hypertension"}],
    "medications": [{"name": "Lisinopril", "dose": "20mg", "frequency": "daily"}],
    "labs": [{"name": "Creatinine", "value": "1.8", "unit": "mg/dL"}]
  },
  "summary": "68-year-old male patient\nPMH: Hypertension\nMedications: Lisinopril 20mg daily..."
}

CSV Format Requirements

The CSV parser auto-detects common column names:
Column NameMaps ToExample Value
ageDemographics68
gender or sexDemographicsmale
diagnosis or conditionConditionsType 2 Diabetes Mellitus
medication or drugMedicationsMetformin
dose or dosageMedication dose1000mg
lab or testLab resultsHbA1c
value or resultLab value7.2
unitLab unit%

Free Text Input

Submit natural language clinical notes directly. The text parser uses regex + heuristics to extract structured data.
curl -X POST https://api.clinicalpilot.ai/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "text": "68 yo male with HTN, DM2. Medications: Lisinopril 20mg daily, Metformin 1000mg BID. BP 145/92, HR 88, glucose 180 mg/dL. Presenting with dizziness and fatigue for 3 days."
  }'

Extraction Rules

The text parser recognizes:
  • Age: 68 yo, 68-year-old, 68 y.o.
  • Gender: Keywords like male, female, man, woman, he, she
  • Pattern: drug_name dose frequency
  • Examples: Lisinopril 20mg daily, Metformin 1000mg BID, Aspirin 81mg PRN
  • Section headers: “Medications:”, “Current Meds:”
  • Common abbreviations: HTN → Hypertension, DM2 → Type 2 Diabetes, CAD → Coronary Artery Disease
  • History patterns: “history of”, “h/o”, “diagnosed with”
  • BP: BP 145/92, BP: 120/80 mmHg
  • HR: HR 88, heart rate 72 bpm
  • Temp: temp 98.6°F, temperature 37.2°C
  • SpO2: SpO2 95%, O2 sat 98%
  • Common labs: troponin, WBC, Hgb, creatinine, BUN, eGFR, glucose, HbA1c, Na, K, D-dimer, BNP, lactate
  • Pattern: lab_name: value unit (e.g., glucose: 180 mg/dL)
  • Keywords: “allergic to”, “allergy:”, “allergies:”
  • NKDA detection: “NKDA”, “no known drug allergies”
PHI Anonymization: All text inputs are automatically scrubbed by Microsoft Presidio before parsing. See Safety System for details.

Voice Input

Voice transcription uses the same text parser as free text input.
Future Feature: Whisper STT integration is planned. Currently, you must transcribe audio separately and submit as text.

Workflow

1

Transcribe Audio

Use Whisper API or your EHR’s built-in dictation:
import openai

with open("clinical_note.mp3", "rb") as audio:
    transcript = openai.Audio.transcribe("whisper-1", audio)
2

Submit Transcript

Send the transcribed text to /api/analyze:
response = requests.post(
    "https://api.clinicalpilot.ai/api/analyze",
    json={"text": transcript["text"]}
)

Unified PatientContext Schema

All parsers output this Pydantic model (backend/models/patient.py):
class PatientContext(BaseModel):
    # Demographics
    age: Optional[int] = None
    gender: Gender = Gender.UNKNOWN
    weight_kg: Optional[float] = None
    height_cm: Optional[float] = None

    # Clinical data
    conditions: list[Condition] = []
    medications: list[Medication] = []
    labs: list[LabResult] = []
    vitals: list[Vital] = []
    allergies: list[Allergy] = []

    # Prompts & raw text
    current_prompt: str = ""  # The clinical question
    raw_text: str = ""        # Anonymized full text

    # Metadata
    source_type: str = "text"  # "fhir", "ehr_pdf", "ehr_csv", "text"
    timestamp: str = "2026-03-03T10:15:00Z"
This schema is the single source of truth consumed by all agents (Clinical, Literature, Safety).

Best Practices

Use FHIR for Integration

If your EHR supports FHIR R4, use it for the most accurate data mapping.

Include Context in Free Text

Add patient history, vitals, and current symptoms for better analysis quality.

Check PHI Before Upload

Presidio auto-scrubs PHI, but review sensitive documents before submission.

Combine Formats

You can submit a FHIR bundle + free-text addendum by merging patient_context with text.

Next Steps

Emergency Mode

Fast-path triage for time-critical cases

AI Chat

Ask follow-up questions about cases

Build docs developers (and LLMs) love