Skip to main content

The damage AI hallucinations can do – and how to avoid them

'Even if these systems are right 80% of the time, that still means they're wrong 20% of the time,' says tech CMO Dr. Jay Anders, who describes the risks of artificial intelligence errors and outlines some protection strategies for providers.
By Bill Siwicki
Dr. Jay Anders of Medicomp Systems on AI hallucinations
Zeb (left) and his best friend, Dr. Jay Anders, chief medical officer at Medicomp Systems
Photo: Dr. Jay Anders

Health systems are embracing artificial intelligence tools that help their clinicians simplify the creation of chart notes and care plans, saving them precious time every day. 

But what's the impact on patient safety if AI gets the facts wrong?

Even the most casual users of ChatGPT and other large language model-based generative AI tools have experienced errors – often called "hallucinations."

An AI hallucination occurs when an LLM cannot find an appropriate answer and simply makes something up. Essentially, when an LLM doesn't know the correct answer or can't locate appropriate information, it fabricates a response, rather than admitting uncertainty.

These fabricated responses are particularly problematic because they're often very convincing. The hallucinations can be very difficult to distinguish from factual information, depending on what's being asked. If an LLM can't find the right medical code for a particular condition or procedure, for example, it might invent a number.

The core issue is that LLMs are designed to predict the next word and provide responses, not to acknowledge when they don't have sufficient information. That creates a fundamental tension between the technology's drive to be helpful and its tendency to generate plausible sounding but inaccurate content when faced with uncertainty.

For some further perspective on AI hallucinations and their potential impact on healthcare, we spoke recently with Dr. Jay Anders, chief medical officer at Medicomp Systems, a vendor of evidence-based, clinical AI-powered systems designed to make data usable for connected care and enhanced decision-making. He plays a key role in product development and acts as a liaison to the healthcare community.

Q. What does AI's ability to generate hallucinations mean for clinical and administrative staff in healthcare wanting to use AI?

A. The implications are significantly different for clinical versus administrative applications. In clinical medicine, hallucinations create serious problems because accuracy isn't negotiable. I recently read a study showing AI summarization gets things right about 80% of the time. That might earn you a B-minus in college, but B-minus doesn't work in healthcare. Nobody wants B-minus healthcare – they want A-level care.

Let me give you specific examples from clinical record summarization, which many healthcare IT companies are rushing to implement. When AI summarizes clinical records, it can make two critical errors. First, it may fabricate information that simply isn't there. Second, it can misattribute diseases – taking a family member's condition and assigning it to the patient. So, if I mention "my mother has diabetes," the AI might document that I have diabetes instead.

AI also struggles with context. If I'm discussing a physical exam, it might introduce elements that have nothing to do with physical examinations. It loses track of what we're actually talking about.

For administrative tasks, the risks are generally lower. If AI makes errors in equipment inventory, pharmacy supplies or scheduling, while problematic, these mistakes won't directly harm patients. The stakes are fundamentally different when we're dealing with clinical documentation versus operational logistics.

Q. What are potential negative outcomes of hallucinations in healthcare AI, and how can they propagate through processes and systems?

A. The negative outcomes operate on multiple levels and create cascading effects that are extremely difficult to reverse. When AI assigns incorrect diseases, lab results or medications to a patient's record, these errors become nearly impossible to correct and can have devastating long-term consequences.

Consider this scenario. If AI incorrectly documents that I have leukemia based on my mother's medical history, how will I get life insurance? Will employers want to hire someone they believe has active leukemia? These errors create immediate and long-term impacts that extend far beyond the healthcare setting.

The propagation problem is particularly insidious. Once incorrect information enters a medical record, it gets copied and shared across multiple systems and providers.

Even if I, as a physician, catch the error and document a correction, that original record already has been sent to numerous other healthcare providers who won't receive my correction. It becomes like a dangerous game of telephone – the error spreads throughout the healthcare network, and each iteration makes it more difficult to trace and correct.

This creates two types of propagation: The spread of actual errors and the erosion of trust in the system. I've seen AI-generated summaries that can't even maintain consistency about a patient's gender within a single document – calling someone "he," then "she," then "he" again.

When lawyers encounter this kind of inconsistency in legal proceedings, they'll question everything: "If it can't determine whether someone is male or female, how can we trust any of the information?"

The trust issue is crucial because once confidence in AI-generated content erodes, even accurate information may be dismissed as unreliable.

Q. What are actions hospitals and health systems can take when using AI tools to avoid negative consequences of hallucinations?

A. Healthcare organizations need to approach AI implementation strategically rather than throwing technology at every problem like "mud on a wall." The key is focused, purposeful deployment with strong human oversight.

First, clearly define what problem you're trying to solve with AI. Are you addressing clinical diagnostics, or are you managing pharmacy inventory? Don't jump straight into high-risk clinical applications without understanding what the technology can and cannot do reliably.

I know of a vendor that implemented an AI sepsis detection system that was wrong 50% of the time. The hospital CEO, who's a friend of mine, simply turned it off because they realized they didn't even have a significant sepsis problem to begin with.

Second, choose your AI tools carefully. Different models excel at different tasks. What GPT-4 does well, Claude might not, and vice versa. Validate the technology with your own data and patient populations. Vendors should provide confidence levels for their systems, whether they're accurate 90%, 95% or only 20% of the time for your specific use case.

Most important, maintain human oversight at all times. AI should augment human processes, not replace them. Always keep a human in the loop to validate AI outputs before they're implemented or documented. This applies whether you're dealing with billing, coding or clinical decisions. When humans catch AI mistakes, that feedback can help improve the system over time.

The current environment feels like Dodge City. Everyone is using AI for everything without proper validation or safeguards. This "AI for AI's sake" mentality is dangerous. Not every process requires artificial intelligence.

If a patient comes to my office with a low-grade fever, sore throat and runny nose, I don't need AI to tell me it's likely viral. Some things are straightforward enough that adding AI complexity only increases cost and potential for error.

Q. What should healthcare CIOs, CAIOs and other IT leaders ask vendors with AI in their tools about how they are protecting against hallucinations?

A. IT leaders need to ask direct, specific questions about validation and performance. Start with the fundamentals: What confidence levels can you provide for your system's accuracy? Can you demonstrate how your AI performs with real healthcare data similar to ours? Don't accept vague promises – demand concrete evidence of performance metrics.

Ask about the training data and validation process. How was the AI model trained, and with what type of medical information? Has the system been tested specifically for the clinical scenarios you plan to implement? Different AI models have varying strengths, so ensure the vendor's system aligns with your intended use cases.

Inquire about human oversight mechanisms. How does the vendor recommend integrating human validation into their workflow? What safeguards are built into the system to flag potentially problematic outputs? The vendor should have clear recommendations for maintaining human oversight rather than encouraging full automation.

Request information about error detection and correction processes. When hallucinations occur – and they will – how quickly can they be identified and corrected? What mechanisms are in place to prevent the propagation of errors across systems? How does the vendor handle feedback to improve their models over time?

Finally, be wary of vendors promising revolutionary capabilities that seem too good to be true. Some companies are developing "doctor replacement" chatbots or complex multi-LLM systems that claim to outperform clinicians. Even if these systems are right 80% of the time, that still means they're wrong 20% of the time. Would you be comfortable being in that 20%?

The goal isn't to avoid AI entirely. The technology offers genuine benefits when used appropriately. But we need to implement it thoughtfully, with proper safeguards, and always with human oversight. The stakes in healthcare are simply too high for anything less than the most careful, validated approach to AI deployment.

Follow Bill's HIT coverage on LinkedIn: Bill Siwicki
Email him: bsiwicki@himss.org
Healthcare IT News is a HIMSS Media publication.

WATCH NOW: Seattle Children's chief AI officer talks better outcomes through the technology

More Regional News

Healthcare workers talking in hallway
Duke Health eradicates long-standing regulatory audit hassles
By |