Skip to main content
ClinicalPilot processes Protected Health Information (PHI). If deployed in a healthcare setting, it must comply with HIPAA (Health Insurance Portability and Accountability Act) regulations.
This guide provides technical recommendations, not legal advice. Consult with a compliance officer or healthcare attorney to ensure full HIPAA compliance.

HIPAA Requirements Overview

HIPAA requires:
  1. Administrative Safeguards — Policies, procedures, training
  2. Physical Safeguards — Facility access controls, workstation security
  3. Technical Safeguards — Access controls, encryption, audit logging
  4. Business Associate Agreements (BAAs) — With all vendors handling PHI
ClinicalPilot addresses Technical Safeguards primarily.

PHI Handling in ClinicalPilot

Where PHI Flows

User Input (PHI)

Microsoft Presidio (Anonymization)

Anonymized Data

┌────────────────────────────────────┐
│  Agent Layer (LLM API calls)       │
│  - OpenAI API (if enabled)         │
│  - Groq API (if enabled)           │
│  - Local LLM (if enabled)          │
└────────────────────────────────────┘

LanceDB (Vector Store — anonymized)

SOAP Note Output (anonymized)
ClinicalPilot uses Microsoft Presidio to scrub PHI before data reaches any LLM. This includes names, dates, MRNs, addresses, phone numbers, and SSNs.

Technical Safeguards

1. Access Control (§164.312(a)(1))

Implement User Authentication

ClinicalPilot does not include built-in authentication. You must add authentication before production deployment. Recommended: OAuth2 with JWT
# backend/auth.py
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
from jose import JWTError, jwt

oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")

async def get_current_user(token: str = Depends(oauth2_scheme)):
    credentials_exception = HTTPException(
        status_code=status.HTTP_401_UNAUTHORIZED,
        detail="Could not validate credentials",
        headers={"WWW-Authenticate": "Bearer"},
    )
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
        username: str = payload.get("sub")
        if username is None:
            raise credentials_exception
    except JWTError:
        raise credentials_exception
    return username
Protect all endpoints:
@app.post("/api/analyze")
async def analyze(request: AnalyzeRequest, user: str = Depends(get_current_user)):
    # Only authenticated users can analyze cases
    ...

Unique User Identification

Each user must have a unique ID. Log all actions:
logger.info(f"User {user_id} analyzed case {case_id}")

Emergency Access Procedure

Define break-glass access for emergencies:
  • Super admin account (disabled by default)
  • Audit log of all emergency access
  • Time-limited emergency sessions

2. Audit Controls (§164.312(b))

Log All PHI Access

# backend/audit_log.py
import logging
from datetime import datetime

audit_logger = logging.getLogger("audit")

def log_phi_access(user_id: str, action: str, resource: str, success: bool):
    audit_logger.info(
        f"[AUDIT] user={user_id} action={action} resource={resource} "
        f"success={success} timestamp={datetime.utcnow().isoformat()}"
    )
Log events:
  • Access: User viewed a SOAP note
  • Modify: User edited patient data
  • Delete: User deleted a record
  • Export: User exported data to PDF
  • Failed access: Authentication failures

Audit Log Retention

HIPAA requires:
  • Minimum 6 years retention
  • Tamper-proof storage (write-once, append-only)
  • Regular review (quarterly recommended)
Recommended: AWS CloudWatch Logs, Azure Monitor, or self-hosted ELK stack.

3. Integrity (§164.312(c)(1))

Data Integrity Controls

  • Checksums: Verify data has not been altered
  • Digital signatures: Sign SOAP notes with cryptographic signatures
  • Version control: Track all edits to patient records
Example:
import hashlib

def compute_checksum(data: str) -> str:
    return hashlib.sha256(data.encode()).hexdigest()

# Store checksum with each SOAP note
soap_note.checksum = compute_checksum(soap_note.to_json())

4. Transmission Security (§164.312(e)(1))

Encryption in Transit

TLS 1.2+ is required for all network communication. Nginx configuration:
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
Verify:
openssl s_client -connect clinicalpilot.example.com:443 -tls1_2

API Communication

OpenAI and Groq APIs transmit data to third-party servers. Even though ClinicalPilot anonymizes data first, you must:
  1. Sign a Business Associate Agreement (BAA) with OpenAI/Groq
  2. Verify they are HIPAA-compliant
  3. Use zero data retention settings (OpenAI: set data_retention: "zero" in API calls)
Alternatives:
  • Use local LLMs (MedGemma via Ollama) — no data leaves your network
  • Use Azure OpenAI (has BAA available) instead of public OpenAI API
  • Self-host open-source LLMs (Llama 3.1, Mistral) on your infrastructure

5. Encryption at Rest (§164.312(a)(2)(iv))

All stored PHI must be encrypted.

Database Encryption (LanceDB)

LanceDB stores data as files. Encrypt the entire volume: Linux (LUKS):
sudo cryptsetup luksFormat /dev/sdb
sudo cryptsetup open /dev/sdb lancedb_encrypted
sudo mkfs.ext4 /dev/mapper/lancedb_encrypted
sudo mount /dev/mapper/lancedb_encrypted /var/lib/clinicalpilot/lancedb
Cloud Providers:
  • AWS: Use EBS encryption (enabled by default)
  • Azure: Use Azure Disk Encryption
  • GCP: Use persistent disk encryption

Log Encryption

Encrypt audit logs:
# Encrypt with GPG
gpg --symmetric --cipher-algo AES256 audit.log
Or use encrypted log aggregation (AWS CloudWatch, Splunk Enterprise).

6. Anonymization (§164.514)

HIPAA allows de-identified data to be exempt from most regulations. ClinicalPilot uses Microsoft Presidio to remove 18 PHI identifiers:
  1. Names
  2. Dates (except year)
  3. Phone numbers
  4. Addresses
  5. Email addresses
  6. SSNs
  7. MRNs
  8. Device IDs
  9. URLs
  10. IP addresses
  11. Biometric IDs
  12. Photos
  13. Account numbers
  14. Certificate/license numbers
  15. Vehicle IDs
  16. Web URLs
  17. Fax numbers
  18. Any unique identifying number

Verify Anonymization

Test Presidio output:
from backend.input_layer.anonymizer import anonymize_text

text = "Patient John Smith, DOB 01/15/1980, MRN 123456, phone 555-1234"
result = anonymize_text(text)

print(result)
# Expected: "Patient [NAME], DOB [DATE], MRN [ID], phone [PHONE]"

assert "John Smith" not in result
assert "123456" not in result
Anonymization is not perfect. Review edge cases:
  • Medical device model numbers (may be unique identifiers)
  • Rare diseases (may indirectly identify patient)
  • Combination of age + zip code + diagnosis (quasi-identifiers)
For maximum safety, use Expert Determination or Safe Harbor de-identification methods (consult a compliance expert).

Business Associate Agreements (BAAs)

You must sign BAAs with all vendors who access PHI:
VendorPHI Access?BAA Required?Status
OpenAIYes (even if anonymized)YesAvailable (Enterprise only)
GroqYesYesContact Groq sales
AWS / Azure / GCPYes (if hosting)YesAvailable
LangSmithYes (trace data)YesAvailable (Enterprise plan)
Local LLM (Ollama)No (on-premises)NoN/A
If you use local LLMs only (MedGemma via Ollama), you can avoid BAAs with LLM providers entirely, as no data leaves your network.

Risk Assessment

HIPAA requires periodic Security Risk Assessments.

Checklist

  • Identify all systems that handle PHI
  • Document data flows (where PHI travels)
  • Assess risks (unauthorized access, data breaches, etc.)
  • Implement safeguards (encryption, access controls, etc.)
  • Test incident response plan (ransomware, data breach)
  • Review annually or when systems change

Tools

  • NIST Cybersecurity Framework
  • HHS Security Risk Assessment Tool
  • Third-party auditors (healthcare IT compliance firms)

Incident Response Plan

Prepare for breaches:
  1. Detection: Monitor for unauthorized access (SIEM, IDS)
  2. Containment: Isolate affected systems
  3. Notification: Notify affected individuals within 60 days (HIPAA Breach Notification Rule)
  4. Remediation: Patch vulnerabilities, reset credentials
  5. Documentation: Record all actions taken

Example Breach Scenarios

ScenarioResponse
Laptop with PHI stolenEncrypt all endpoints; notify affected patients
Ransomware attackRestore from backup; notify HHS if >500 individuals affected
Unauthorized API accessRevoke API keys; audit access logs; notify users
Presidio anonymization failureStop all processing; review anonymization logic; re-anonymize data

Self-Assessment

Use this checklist before production deployment:

Administrative Safeguards

  • Security officer designated
  • Workforce trained on HIPAA
  • Policies documented (access control, incident response, etc.)
  • BAAs signed with all vendors
  • Annual risk assessment scheduled

Physical Safeguards

  • Servers in locked facility (or cloud with physical security)
  • Workstations have automatic screen locks
  • Access logs reviewed (who entered server room)

Technical Safeguards

  • User authentication enabled (OAuth2, SAML)
  • Unique user IDs assigned
  • Audit logging configured (all PHI access)
  • Encryption in transit (TLS 1.2+)
  • Encryption at rest (disk encryption)
  • PHI anonymization verified (Presidio tests)
  • LLM provider BAAs signed (if using cloud APIs)
  • API rate limiting enabled (DDoS protection)
  • Regular backups (encrypted, tested)
  • Intrusion detection system (IDS) deployed

HIPAA-Compliant Deployment Example

Scenario: Hospital deploys ClinicalPilot for clinical decision support. Setup:
  1. Infrastructure: AWS GovCloud (HIPAA-eligible)
  2. Compute: EC2 instances with encrypted EBS volumes
  3. LLM: Azure OpenAI (BAA signed) instead of public OpenAI
  4. Vector DB: LanceDB on encrypted EFS
  5. Authentication: Okta SSO with MFA
  6. Audit Logs: AWS CloudWatch Logs (6-year retention)
  7. Network: VPC with private subnets, no internet access
  8. Monitoring: AWS GuardDuty for threat detection
  9. Backups: Daily encrypted snapshots to S3 (with versioning)
  10. Incident Response: PagerDuty alerts for suspicious activity
Result: Fully HIPAA-compliant deployment with zero PHI leaving the organization’s AWS account (except Azure OpenAI, which has BAA).

Alternative: Air-Gapped Deployment

For maximum security, deploy ClinicalPilot completely offline:
  • Local LLM (MedGemma via Ollama)
  • No external APIs (disable PubMed, FDA, RxNorm)
  • Local drug database (DrugBank CSV, RxNorm offline files)
  • Self-hosted observability (Langfuse on-premises)
  • No internet access for application servers
This eliminates all third-party data transmission, making HIPAA compliance significantly easier.

Resources

Next Steps

Production Deployment

Deploy ClinicalPilot with HIPAA safeguards

Observability

Set up audit logging with LangSmith

Build docs developers (and LLMs) love