Back to all articles
16 MIN READ

Medical Prompt Engineering: Healthcare Domain Guide with Claude

By Learnia AI Research Team

Medical Prompt Engineering: Healthcare Domain Guide with Claude

Medical prompt engineering has nothing in common with classifying support tickets or generating marketing copy. Here, every word matters: a hallucination can become a misdiagnosis, an omission can delay treatment, and an incorrect format can render a clinical note unusable. This guide shows you how to design reliable prompts for the healthcare domain, with guardrails tailored to clinical stakes.

Why Medical Prompting Is Different

Healthcare imposes constraints that simply don't exist elsewhere. Understanding these differences is a prerequisite before writing a single line of prompt.

The 4 Domain-Specific Risks

  1. Factual hallucination — The model invents a dosage, an ICD-10 code, or a drug interaction
  2. Clinical omission — Critical information from the patient record is ignored
  3. False certainty — The model presents a hypothesis as an established fact
  4. Demographic bias — Recommendations vary by age, sex, or ethnicity in unjustified ways

For a deep dive into hallucination and bias detection, see our AI hallucinations and bias detection guide.

Loading diagram…

Anatomy of a Medical Prompt

A robust medical prompt contains 5 essential sections, each playing a specific role in output reliability.

Base Structure

<role>
You are a specialized clinical documentation assistant.
You help structure and summarize medical notes.
You are NOT a medical diagnostic device.
All output requires validation by a qualified healthcare professional.
</role>

<clinical_context>
Department: {department}
Document type: {document_type}
Patient: [ANONYMIZED - ID: {patient_id}]
</clinical_context>

<terminology_rules>
- Use ICD-10-CM terminology for diagnoses
- Use CPT nomenclature for procedures
- Use generic names (INN) for medications
- Flag any ambiguous abbreviation with its full meaning
</terminology_rules>

<uncertainty_protocol>
If information is ambiguous or missing:
- Mark with [UNCERTAIN] and explain why
- NEVER invent or infer a diagnosis not mentioned
- Prefer "not documented" over any assumption
</uncertainty_protocol>

<output_format>
{format_instructions}
</output_format>

SOAP Notes: Clinical Structuring with Claude

SOAP notes (Subjective, Objective, Assessment, Plan) are the standard clinical documentation format. Here's how to build a prompt that generates them reliably.

SOAP Generation Prompt

import anthropic

client = anthropic.Anthropic()

def generate_soap_note(transcript: str, patient_context: str) -> str:
    """Generate a structured SOAP note from a clinical transcript."""

    system_prompt = """You are a clinical documentation assistant.
Your role: structure consultation information into a SOAP note.

CRITICAL RULES:
1. Include ONLY information present in the transcript
2. Mark [UNCERTAIN] any ambiguous information
3. Mark [NOT DOCUMENTED] sections with no data
4. NEVER invent a diagnosis or clinical finding
5. Use ICD-10-CM terminology for diagnoses
6. This document requires validation by the attending physician

SOAP FORMAT:
## Subjective (S)
Chief complaint, patient-reported symptoms,
history of present illness, relevant past history.

## Objective (O)
Vital signs, physical exam, lab results,
imaging — only measured/observed data.

## Assessment (A)
Differential diagnoses, clinical impression,
ICD-10-CM codes in parentheses.

## Plan (P)
Prescribed treatments with dosage, follow-up tests,
referrals, patient education."""

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        temperature=0,
        system=system_prompt,
        messages=[
            {
                "role": "user",
                "content": f"""<patient_context>
{patient_context}
</patient_context>

<consultation_transcript>
{transcript}
</consultation_transcript>

Generate the structured SOAP note for this consultation.
Flag any uncertainty with [UNCERTAIN]."""
            }
        ]
    )
    return message.content[0].text

Example Output

## Subjective (S)
58-year-old male presenting with retrosternal chest pain
onset 2 hours ago at rest. Describes pressure-like sensation
radiating to left arm. Intensity 7/10. Associated nausea.
PMH: HTN on amlodipine 5mg, T2DM on metformin 1000mg BID,
active smoker 30 pack-years.

## Objective (O)
BP: 165/95 mmHg, HR: 92 bpm, SpO2: 96% RA
ECG: ST elevation in V1-V4
Troponin I: 0.8 ng/mL (normal < 0.04)
Lung exam: bilateral basal crackles

## Assessment (A)
- ST-elevation myocardial infarction (STEMI),
  anterior wall (I21.0)
- Essential hypertension, uncontrolled (I10)
- Type 2 diabetes mellitus (E11.9)
- [UNCERTAIN] Possible early left heart failure
  (bilateral crackles — correlate with imaging)

## Plan (P)
1. Activate STEMI protocol — emergent cardiac catheterization
2. Aspirin 250mg IV + Ticagrelor 180mg loading dose
3. IV Heparin per protocol
4. Continuous ECG + SpO2 monitoring
5. Repeat troponin at H+6
6. Post-procedure echocardiography
7. Complete lipid panel at discharge

⚠️ DISCLAIMER: This note is AI-generated and requires
validation by the attending physician before integration
into the patient's medical record.

Clinical Summary Pipeline

A concrete use case: transforming lengthy clinical notes into structured summaries for care transitions (handoffs, discharge summaries).


Safety Guardrails

Guardrails are not optional in medical AI — they are a prerequisite for deployment.

1. Systematic Disclaimers

MEDICAL_DISCLAIMER = """
⚠️ WARNING — AI-GENERATED DOCUMENT
This content is produced by a language model (Claude, Anthropic).
It does NOT constitute medical advice, a diagnosis, or a prescription.
All information must be validated by a qualified healthcare professional
before clinical use. Do not make any medical decisions based
solely on this document.
"""

def add_safety_wrapper(llm_output: str) -> str:
    """Wrap any medical output with mandatory guardrails."""
    return f"{MEDICAL_DISCLAIMER}\n\n{llm_output}\n\n{MEDICAL_DISCLAIMER}"

2. Uncertainty Detection

UNCERTAINTY_MARKERS = ["[UNCERTAIN]", "[NOT DOCUMENTED]", "[VERIFY]"]

def assess_confidence(output: str) -> dict:
    """Assess the confidence level of a medical output."""
    markers_found = [m for m in UNCERTAINTY_MARKERS if m in output]
    uncertainty_count = sum(output.count(m) for m in UNCERTAINTY_MARKERS)

    return {
        "has_uncertainty": len(markers_found) > 0,
        "uncertainty_count": uncertainty_count,
        "markers": markers_found,
        "risk_level": "HIGH" if uncertainty_count > 3 else
                      "MEDIUM" if uncertainty_count > 0 else "LOW",
        "requires_priority_review": uncertainty_count > 3
    }

3. Hallucination Prevention

The most effective strategy against medical hallucinations combines three layers of defense:

def build_anti_hallucination_prompt(clinical_data: str) -> str:
    """Build a prompt with triple anti-hallucination protection."""
    return f"""<grounding_rules>
ABSOLUTE RULE: You may ONLY use information present in
<clinical_data>. For every assertion:
1. It must be directly supported by the provided data
2. If you are unsure → [UNCERTAIN]
3. If the information does not exist in the data → [NOT DOCUMENTED]
4. NEVER invent a numerical value (dosage, lab result, code)
5. NEVER make a diagnosis not mentioned in the data
</grounding_rules>

<clinical_data>
{clinical_data}
</clinical_data>

<verification_instruction>
After your response, list each clinical fact mentioned
and indicate the source sentence in <clinical_data> that supports it.
Format: [FACT] → [SOURCE LINE]
</verification_instruction>"""

For a comprehensive approach to hallucination prevention, see our dedicated guide.


Medical Coding: ICD-10 and CPT

Automated medical code extraction is a high-potential but high-risk use case. LLMs frequently "hallucinate" plausible but incorrect codes.

Structured Coding Prompt

def extract_medical_codes(clinical_note: str) -> str:
    """Extract ICD-10-CM and CPT codes from a clinical note."""

    system = """You are a medical coding specialist.
Extract ICD-10-CM (diagnoses) and CPT (procedures) codes
from the provided clinical note.

CODING RULES:
1. Code to the highest level of specificity available
2. Distinguish primary from secondary diagnoses
3. Include laterality codes when applicable
4. If the diagnosis is imprecise → use the unspecified code
   and mark [VERIFY - coding to be confirmed]
5. NEVER invent a code — if uncertain, provide the
   description and mark [CODE NOT CONFIRMED]

OUTPUT FORMAT (JSON):
{
  "primary_diagnosis": {
    "description": "...",
    "icd10_code": "...",
    "confidence": "high|medium|low"
  },
  "secondary_diagnoses": [...],
  "procedures": [
    {"description": "...", "cpt_code": "...", "confidence": "..."}
  ],
  "flags": ["uncertainty messages if applicable"]
}"""

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        temperature=0,
        system=system,
        messages=[{"role": "user", "content": clinical_note}]
    )
    return message.content[0].text

For more techniques on reliable JSON output with Claude, see our structured JSON output guide.


Patient-Facing vs. Clinician-Facing Outputs

The same clinical content must be formulated very differently depending on the audience. This is an often-overlooked aspect of medical prompting.

Example: Same Information, Two Prompts

# Prompt for clinician summary
clinician_prompt = """Write a technical clinical summary
for the attending physician. SOAP format, ICD-10 terminology,
include all lab results with normal ranges."""

# Prompt for patient instructions
patient_prompt = """Write discharge instructions for
the patient in plain, accessible language.
- No medical jargon (explain every technical term)
- Bullet lists for medications (name, what it's for, when to take it)
- Clear "When to return to the ER" section
- Warm, encouraging tone
- Reading level: general public"""

Evaluation Metrics for Medical AI

Evaluating a medical AI pipeline requires domain-specific metrics that classic BLEU or ROUGE scores don't capture.

Loading diagram…

Automated Evaluation Script

def evaluate_medical_output(output: str, reference: dict) -> dict:
    """Evaluate a medical output's quality against a reference."""

    scores = {}

    # 1. Section completeness
    required = ["Subjective", "Objective", "Assessment", "Plan"]
    present = [s for s in required if s.lower() in output.lower()]
    scores["section_completeness"] = len(present) / len(required)

    # 2. Uncertainty rate (should be > 0 for ambiguous cases)
    uncertainty_markers = output.count("[UNCERTAIN]") + output.count("[NOT DOCUMENTED]")
    scores["uncertainty_flagging"] = uncertainty_markers

    # 3. Disclaimer presence
    scores["has_disclaimer"] = any(
        term in output.lower()
        for term in ["validation", "healthcare professional", "warning"]
    )

    # 4. ICD-10 code validation (regex format check)
    import re
    icd_codes = re.findall(r'\b[A-Z]\d{2}\.?\d{0,2}\b', output)
    scores["icd10_codes_found"] = len(icd_codes)

    return scores

For automating your evaluations at scale, see our Promptfoo evaluation guide for Claude.


Regulatory Considerations

HIPAA and Patient Data

Using LLMs with health data imposes strict precautions:

import re

def anonymize_phi(text: str) -> str:
    """Remove Protected Health Identifiers (PHI) before
    sending to the API. Based on the 18 HIPAA identifiers."""

    patterns = {
        "SSN": r'\b\d{3}-\d{2}-\d{4}\b',
        "Phone": r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b',
        "Email": r'\b[\w.-]+@[\w.-]+\.\w+\b',
        "MRN": r'\bMRN[:\s]*\d+\b',
        "Date_of_birth": r'\b\d{2}/\d{2}/\d{4}\b',
        "Name_pattern": r'\b(?:Mr|Mrs|Ms|Dr)\.?\s+[A-Z][a-z]+\s+[A-Z][a-z]+\b',
    }

    anonymized = text
    for phi_type, pattern in patterns.items():
        anonymized = re.sub(pattern, f'[{phi_type}_REDACTED]', anonymized)

    return anonymized

# Always anonymize BEFORE the API call
safe_text = anonymize_phi(raw_clinical_note)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": safe_text}]
)

Key Compliance Points

RequirementImplementation
PHI anonymizationPre-processing before API call
Audit loggingLog every request/response (without PHI)
ConsentPatient must be informed about AI usage
Right to explanationAbility to justify every AI output
Data retentionRetention policy compliant with GDPR/HIPAA
BAABusiness Associate Agreement with API provider

For more on structured outputs with Claude, see the structured outputs in strict mode guide.


Deployment Checklist

Before putting a medical AI pipeline into production, validate every item:

  • Anonymization — No PHI leaves the authorized perimeter
  • Disclaimers — Present on every output, visible and non-bypassable
  • Uncertainty protocol — The model actively flags what it doesn't know
  • Terminology validation — ICD-10/CPT codes verified against reference databases
  • Clinician review — Human validation workflow in place
  • Test suite — At least 200 cases covering common conditions and edge cases
  • Monitoring — Drift detection, error rate, clinician time tracking
  • Rollback plan — Procedure to revert to manual workflow if needed
  • Documentation — Versioned prompt, changelog, baseline metrics
  • Training — Clinicians know how to interpret and correct AI outputs

For a structured approach to the prompt engineering process (versioning, testing, iteration), see our prompt engineering process guide.


FAQ

Why is medical prompt engineering different from other domains?

Healthcare imposes constraints around legal liability, standardized terminology (ICD-10, SNOMED CT), patient safety, and regulatory compliance. A hallucination in a marketing summary is annoying; in a clinical summary, it can lead to a medical error.

How do you structure a prompt for reliable SOAP notes?

Use clear XML sections (<role>, <format_soap>, <uncertainty_protocol>), provide few-shot examples from clinician-validated notes, enforce a strict format with required sections, and add uncertainty markers ([UNCERTAIN], [NOT DOCUMENTED]) for any ambiguous information.

Can Claude replace a doctor for diagnosis?

No. Claude is a tool for documentation assistance, structuring, and clinical data analysis. It is not a medical diagnostic device. All output must be reviewed and validated by a healthcare professional before clinical use.

How do you evaluate the quality of a medical AI pipeline?

Combine three types of metrics: automated (section completeness, code validity, format compliance), clinician (factual accuracy, relevance, omissions), and safety (hallucination rate, uncertainty flagging, disclaimer presence). Purely automated evaluation is insufficient — clinician review is essential.


Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

Why is medical prompt engineering different from other domains?+

Healthcare imposes constraints around legal liability, specialized terminology, patient safety, and regulatory compliance (HIPAA, GDPR). A misclassification or hallucination can have direct clinical consequences, requiring guardrails that don't exist in other domains.

How do you structure a prompt to generate reliable SOAP notes?+

Use dedicated XML sections (<role>, <format_soap>, <constraints>) with few-shot examples from anonymized real notes. Enforce structured extraction (Subjective, Objective, Assessment, Plan) and add explicit instructions to flag uncertainty with markers like [UNCERTAIN] or [VERIFY].

Can Claude replace a doctor for diagnosis?+

No. Claude is a documentation and analysis assistance tool, not a medical diagnostic device. All outputs must be reviewed by a qualified healthcare professional. Prompts must include explicit disclaimers and never present results as definitive diagnoses.

How do you evaluate the quality of a medical AI pipeline?+

Use domain-specific metrics: terminology accuracy (SNOMED/ICD-10 comparison), section completion rates, factual hallucination detection, uncertainty flagging rates, and compliance with expected clinical formats. Combine automated evaluation with clinician review.