Just Added!

New Videos with Amal Mattu, MD

Watch NowGo

Pop Quiz for Dr. Chatbot – How Did AI Do?

November 16, 2023

Written by Doug Wallace

Spoon Feed
A popular artificial intelligence (AI) chatbot, ChatGPT, was fed hundreds of medical questions of varying difficulty. Responses generally demonstrated high accuracy and completeness, but inaccuracies were still noted. AI chatbot use in routine clinical practice simply isn’t ready for primetime.

The future is now
It seems AI tools are being applied to nigh every discipline, and medicine is no exception (ChatGPT vs Physician Message Responses, ChatGPT Journal Articles, ChatGPT Tough Cases). Results have been mixed, and chatbots generating wholly inaccurate responses or “hallucinations” is well described. Investigation into how we might be able to safely fold these tools into clinical practice is ongoing. 

This cross-sectional study asked 33 physicians from 17 specialties to generate 284 medical questions that ChatGPT (v3.5/4.0) provided answers to. Questions were intentionally open-ended and varied in difficulty from binary to descriptive. Chatbot prompts were standardized for consistency. Table 1 and the Supplement 1 eAppendix from the paper provide some interesting example Q & As.

Answers were judged by the question authors, and rated on Likert scales for accuracy (1-6, 6 completely correct) and completeness (1-3, 3 comprehensive). Median accuracy score was 5.0, median completeness score was 3.0. Around 20% of responses were felt to be inaccurate. The authors note that while these tools may be useful in the future, there are many hurdles to overcome. Those mentioned include accuracy, ethical, educational, medicolegal, privacy, and regulatory concerns. 

How will this change my practice?
This won’t change my practice, for now. However, interest in AI utilization in medicine or otherwise isn’t going anywhere, and routine use of these tools may come to fruition in the all-too-near future. As AI tools become increasingly advanced, healthcare providers will have to reckon with how to integrate them safely into patient care and would be wise to maintain an active voice in this process. Lots of work remains to be done in this space, with studies like this blazing a path forward. 

Accuracy and Reliability of Chatbot Responses to Physician Questions. JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.