Medical AI – You Reap What You Sow
July 23, 2024
Written by Doug Wallace
Spoon Feed
This thoughtful review illustrates how human biases and ethical assumptions impact medical AI models, as outputs are based on the training data ingested, human fine-tuning to optimize responses, and the nature and perspective of prompts used. As such, the authors recommend caution in the application of AI to medical decision making.
Input ≠ output
The article first breaks down how the “human values” involved in any predictive model generation can have significant effects on the final output. The basics of large language model (LLM) creation (i.e. Chat GPT-4.0), are then elucidated. An initial “pre-training” phase involves feeding the base algorithm large amounts of data, “training” the LLM to predict the next word in a sequence. This is followed by “fine-tuning”, whereby human feedback is used to optimize model responses by ranking outputs and subsequent reinforcement learning. As such, the output of any model is inherently dependent on both the data with which it is provided, and the bias of the people training it. Perhaps even more interesting, this process results in a system in which query responses are not entirely mappable, so called “emergent properties”. In short, input doesn’t always equal output.
To illustrate the potential consequences, the article posits a fascinating thought experiment in which Chat GPT-4.0 was prompted to respond to questions about an identical case involving a 14 year old boy under consideration for growth hormone treatment for short stature. The AI was asked to adopt unique perspectives: the treating clinician, the insurance company, and the boy’s parents. The answers are worth a read and demonstrate the drastic effects changes in prompt and perspective can have on the model’s output.
The authors then describe the science of “medical decision analysis”, a systematic approach to medical decision making that contrasts the relatively objective probabilistic aspects (i.e. measured human growth hormone level in the example above) of a model with the subjective “utilities” of a system (i.e. the personal value of additional height). It is advocated that balancing these two domains results in the best approach.
Lastly, unresolved issues of contemporary AI models in medicine are touched on, including training data integrity, human feedback bias, AI governance, and liability implications. Dedicated research in how AI use in actual clinical settings will affect human decision making is called for.
How will this change my practice?
This article went deep on this important subject and left me with a lot to think about as we enter the age of medical AI. To echo the authors, we clinicians have a shared responsibility to ensure AI models are carefully deployed in a way that explicitly reflects patient values and goals. “Rather than replacing physicians, AI has made…the guidance of a thoughtful physician, more essential than ever.”
Editor’s note: If you have a chance to read this, the different perspectives that the LLM took – arguing for growth hormone when prompted to respond as the pediatric endocrinologist or against it, when prompted to respond as the insurance company – was eye-opening. ~Clay Smith
Source
Medical Artificial Intelligence and Human Values. N Engl J Med. 2024 May 30;390(20):1895-1904. doi: 10.1056/NEJMra2214183. PMID: 38810186.