Artificial Intelligence Health Tools for the General Public Fall Short

Kenny Lin, MD, MPH
Posted on March 30

A 2025 AFP editorial by Dr. Joel Selanikio discussed how artificial intelligence (AI) tools had accelerated an existing trend of “patients bypassing physicians to diagnose and treat themselves,” which began with over-the-counter drugs and online search engines. This direct-to-consumer health care approach received a boost in January with OpenAI’s launch of ChatGPT Health, which invites users to upload their medical records and health data from apps for personalized recommendations.

AI chatbots can provide helpful responses to health questions in several low-stakes contexts, as outlined in this handout from Dewey Labs: translating medical jargon, brainstorming possible causes of symptoms, summarizing research or test results, and preparing questions for an upcoming doctor’s visit. However, a recent study in Nature Medicine highlighted ChatGPT Health’s significant limitations in triaging patients with acute problems to appropriate levels of care.

Dr. Ashwin Ramaswamy and colleagues compared the chatbot’s responses to “60 clinician-authored vignettes across 21 clinical domains under 16 factorial conditions (960 total responses)” to triage levels assigned independently by three physicians: non-urgent, semi-urgent, urgent, and emergency. ChatGPT Health performed well in triaging semi-urgent and urgent clinical situations, but it over-triaged 65% of non-urgent situations and under-triaged 52% of true emergencies. For example, it recommended evaluation in 24 to 48 hours for patients with diabetic ketoacidosis and impending respiratory failure rather than sending them directly to the emergency department. Just as concerning, patients with suicidal ideation were less likely to receive crisis interventions when they had identified a method of self-harm than when they had no identified method:

The crisis guardrail finding may be the most consequential failure mode exhibited in the entire study. … A guardrail that fires for ‘haven’t thought through how I would do it’ but not for ‘thought about taking a lot of pills’ is not calibrated to clinical risk and users have no basis to anticipate when it will or will not fire. The capability to recognize mental health crises and connect users with crisis resources is a basic prerequisite for any consumer health platform. Our data show this prerequisite has not been reliably met.

In another study, three AI chatbots were provided with 10 detailed medical scenarios and tested their ability to diagnose the condition and recommend appropriate management. In the United Kingdom, 1,298 adults were provided the scenarios and randomized to use one of the chatbots or a usual source of their choice (typically an online search engine). When researchers input the full scenarios, the chatbots diagnosed 95% of the conditions and correctly managed them 56% of the time. However, when intervention participants shared elements of the scenarios in live conversations, the chatbots performed much worse, correctly diagnosing 34% of the time and recommending appropriate management in 44%; this result is no better than control participants using a search engine. Researchers observed that participants often failed to provide enough information to make the diagnosis, and slight changes in symptom emphasis or wording of questions frequently led to dramatic differences in advice.

Bottom line: For patient-facing chatbots such as ChatGPT Health to diagnose and triage problems appropriately and safely, it isn’t enough to passively process the incomplete clinical data they are provided. They will need to get much better at asking the right questions to elicit information that patients may not be aware is relevant.

Previous | Next

Get AFP content delivered straight to your inbox.

Sign up to receive twice monthly emails from AFP. You'll get the AFP Clinical Answers newsletter around the first of the month and the table of contents mid-month, shortly before each new issue of the print journal is published.

Topics

Other Blogs

Getting Paid FPM journal
Quick Tips FPM journal
AAFP Voices Blog

Feed

RSS (About RSS)

Disclaimer
The opinions expressed here are those of the authors and do not necessarily reflect the opinions of the American Academy of Family Physicians or its journals. This service is not intended to provide medical, financial, or legal advice. All comments are moderated and will be removed if they violate our Terms of Use.