![AI Doctors](https://emed.news/wp-content/uploads/2025/01/Why-AI-Doctors-Struggle-in-Real-World-Patient-Interactions.png)
![AI Doctors](https://emed.news/wp-content/uploads/2025/01/Why-AI-Doctors-Struggle-in-Real-World-Patient-Interactions.png)
A groundbreaking study from Harvard Medical School and Stanford University, published in Nature Medicine, reveals that while AI doctors excel in standardized medical exams, they often struggle in real-world clinical conversations. The research introduces CRAFT-MD (Conversational Reasoning Assessment Framework for Testing in Medicine), a new evaluation tool designed to measure how well AI doctors or medical tools handle real-life patient interactions.
Large-language models (LLMs), such as ChatGPT, have shown promise in helping clinicians by triaging patients, collecting medical histories, and offering preliminary diagnoses. However, the study highlights a critical issue: AI models that perform well on structured, multiple-choice medical exams often falter when engaging in unstructured, back-and-forth conversations with patients.
CRAFT-MD evaluates AI performance in realistic patient scenarios by simulating clinical conversations where AI agents act as patients and graders. The assessment focuses on key factors such as information gathering, diagnostic reasoning, and conversational accuracy across 2,000 clinical vignettes spanning 12 medical specialties.
The results showed a clear pattern—AI tools often fail to ask the right follow-up questions, overlook essential details in patient history, and struggle to synthesize scattered information from open-ended exchanges. Their diagnostic accuracy dropped significantly compared to their performance on structured exams.
The research team, led by Pranav Rajpurkar, M.D., emphasized that dynamic doctor-patient interactions require AI to adapt, ask critical questions, and piece together fragmented information—abilities current AI tools struggle to master.
To address these limitations, the study recommends:
- Designing AI models with better conversational reasoning capabilities.
- Using open-ended testing frameworks instead of rigid multiple-choice formats.
- Training AI to process verbal and non-verbal cues, including tone and body language.
- Integrating textual and non-textual data, such as images and lab results, into diagnostics.
The findings advocate for a hybrid evaluation approach, where AI agents complement human experts for more efficient, scalable testing.
As noted by Roxana Daneshjou, M.D., from Stanford University, “CRAFT-MD represents a major step forward, offering a real-world evaluation framework to ensure AI tools meet clinical standards before being deployed in healthcare settings.”
Join this webinar to learn how Artificial intelligence can help psychiatrists in the real world.
More Information: An evaluation framework for clinical use of large language models in patient interaction tasks, Nature Medicine (2024). DOI: 10.1038/s41591-024-03328-5
more recommended stories
Heart Failure Risk Cut Post-Attack with PR-364 Drug
Researchers at Cedars-Sinai Medical Center have.
PTSD, Anxiety Linked to Lower Ovarian Reserve in Firefighters
A recent study by the University.
Art, Nature, and Meditation: Unique Brain Activation Patterns
Recent studies have shown that practices.
Biometric Data Reveals Mood Insights in Shift Workers
The COVID-19 pandemic highlighted the global.
Azvudine Cuts COVID-19 Deaths, Outperforming Paxlovid
COVID-19 has caused a global health.
Aspirin Fails to Prevent Colorectal Cancer Recurrence in Trial
Colorectal cancer remains a global health.
Pregnancy Triggers Brain Changes for Maternal Health
A groundbreaking study published in Nature.
Investing in Teen Mental Health Saves Billion In A Decade
Early Mental Health Interventions: Key to.
International Team Solves 500+ Rare Disease Mysteries
Rare diseases affect fewer than five.
Huntington’s Prevention: Biomarkers Offer Early Detection
Huntington’s Disease: Detecting the Silent Progression.
Leave a Comment