Written by Seth Walsh-Blackmore
The newest generation GPT chatbot performed similarly, if not better than, current differential diagnosis generators explicitly designed for this purpose.
I need your clothes, your boots, and your history of present illness.
Though differential diagnosis generators have existed for decades, the excitement or existential dread surrounding AI chatbots – since ChatGPT became widely available, has created speculation about its medical impact.
The authors graded the performance of GPT-4 in generating ranked differential diagnoses of New England Journal of Medicine clinicopathologic conferences. These cases are often used to test differential diagnosis generators.
A chat prompt (see below) instructed the model to generate diagnoses ranked from most to least likely. There was no limit to the size of the differential. This prompt was followed by the case as presented in the publication. The authors used 70 cases from 2021-2022.
The model’s top diagnosis matched the final diagnosis in 39% of cases. The differential included the correct diagnosis for 64%, with an average differential length of 9.0 (SD 1.4). The authors scored the differential as potentially helpful despite not including the actual diagnosis in 29% and only 7% as not useful.
The performance was on par with current differential generators. The authors limited certain information they could provide from the cases per their protocol, and independent chats were run so the GPT could not learn from prior cases. This may underestimate its ability.
How will this change my practice?
Clinical application of AI for this has multiple hurdles ahead. These include understanding where bias/blind spots exist in its algorithm, how to handle PHI, and more robust testing with actual patient data from an EHR.
The hype machine is how tech companies generate investment, so take sweeping statements about how this will affect your job with appropriate skepticism. The only way I see this leading to fewer open jobs is if it mitigates the documentation burden contributing to provider burnout.
The chat prompt used by the authors:
- How is JournalFeed adapting to AI and ChatGPT?
- Can ChatGPT answer a patient message better than a physician?
Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge. JAMA. 2023 Jul 3;330(1):78-80. doi: 10.1001/jama.2023.8288.