
One of the biggest gaps in healthcare AI is that there's still no standard way to validate these tools before or after they're deployed, said Pelu Tran, CEO and cofounder of Ferrum Health, which assists healthcare organizations with comprehensive AI governance that addresses barriers like fragmented implementations, bias, regulatory challenges and lack of accountability.
Unlike drugs or medical devices, AI hits the market with almost no transparency – and once it's out there, performance can swing depending on race, scanner type and even how data is formatted. The need for post-market monitoring is not new – and in this discussion with Healthcare IT News, Tran explains why, in practice, it's just not happening, and what hospital and health system AI and IT leaders need to be doing.
Q. So, there still is no standard way to validate AI tools for healthcare. Why do you believe this creates such a big gap?
A. Pre- and post-deployment validation is one of the most overlooked problems in healthcare AI – and it's a big one. Clinical environments are messy, high-stakes, and way more complicated than any lab or trial setting. Even when an AI model gets FDA clearance, that doesn't mean it'll perform the same way in the real world.
Actually, most don't. An RSNA study found that 81% of AI models dropped in performance when tested on external datasets. For nearly half, it was a noticeable drop, and for a quarter, it was significant. After these tools are approved, there's no standard way to keep an eye on how they work across different scanners, hospitals or patient groups.
So, the burden then shifts to hospitals, which are left to figure it out themselves. That puts clinicians in a tough spot, relying on AI that might not generalize, with no real way to track performance or course correct. For a tool that's supposed to support patient safety, that's an unacceptable blind spot.
Q. Why is this kind of validation not happening today?
A. Right now, the FDA treats AI like a traditional medical device – once it's approved, it's essentially locked in place. Any updates, even small ones, can trigger a whole new approval cycle. That model works for devices like pacemakers, but it doesn't make sense for AI, which is supposed to evolve and get smarter over time.
It's a bit like the early days of electronic health records, when there were major gaps around privacy, compatibility and standards. AI is running into the same kinds of growing pains. Without a regulatory framework that supports continuous validation, we risk stalling progress and eroding trust before these tools even get a chance to prove themselves.
On top of that, states are starting to roll out their own AI rules – especially around fairness and bias – which creates a patchwork of requirements for hospitals to navigate. It's confusing, it's inconsistent and it makes already-slow adoption even slower.
The reality is, even before deployment, it's hard to tell what works. A lot of vendors look the same on paper, but without clear, independent performance data, decisions often come down to who gives the best demo – not who has the best model. That's not exactly a recipe for safe or effective care.
Q. What would a healthcare AI tool validation process look like? What in your opinion must be examined?
A. A solid validation process for healthcare AI has to tackle two big things: diversity and change.
First, before any tool goes live, it needs to be tested across a wide range of real-world conditions – not just in one hospital or on one dataset. That means different scanner types, clinical workflows and patient populations.
We're talking about making sure the model works fairly across races, ages and demographics, and that it can handle all the messy variability that shows up in real clinical practice.
But the work doesn't stop once the tool is deployed. AI models change, whether through formal updates or unintended drift, so ongoing monitoring isn't optional, it's essential. Hospitals need a way to continuously track how these tools are performing, flag issues early, and feed that information back into shared registries. That kind of collective oversight helps everyone make smarter decisions and reduces the chance of surprises at the bedside.
Groups like the Coalition for Health AI and the American College of Radiology already are building the infrastructure to support this – from common standards to national registries. The goal is to move beyond guesswork and glossy marketing claims and toward a shared, transparent view of what actually works, for whom and under what conditions.
Q. What do you believe hospital and health system CIOs, CAIOs and other IT leaders should be looking out for when they are purchasing AI systems?
A. For CIOs and CAIOs evaluating AI tools, there are a few big things to watch out for – and they're easy to miss if you're just focused on the demo or the FDA stamp.
First, don't assume FDA clearance means a model will work well in your environment. Most tools are validated on narrow datasets, often pulled from academic centers in places like New York, Boston or the Bay Area. A patient in rural Oregon might have different comorbidities, imaging protocols or follow-up patterns than one in Manhattan.
These differences can dramatically impact AI performance. At the end of the day, it's on the health system, not the vendor or regulator, to make sure these tools actually work where they're being used.
Second, think long term about infrastructure. Rolling out just one AI tool can take months and cost six figures. Now imagine doing that across radiology, cardiology, pathology and beyond – it's not sustainable to handle each one in a silo.
Instead, push for centralized AI infrastructure – a shared platform where multiple models can be deployed, integrated and monitored together. It reduces redundancy, speeds up deployment, and makes it possible to scale AI system-wide without burning out your IT team.
Third, demand more than buzzwords. Too many purchasing decisions still come down to who has the slickest pitch or the most recognizable logo. What's really needed is standardized, transparent performance reporting – real-world data that shows how a tool performs across different populations, not just the ones that look good in a slide deck. This will help you avoid tools that don't provide value.
Finally, keep the broader ecosystem in mind. If the bar for validation is too high, only the biggest vendors will be able to play, and that could crush innovation from smaller startups that are often building the most exciting tools. The sweet spot is lightweight, post-market performance reporting that holds tools accountable without blocking them from getting to patients in the first place.
Follow Bill's HIT coverage on LinkedIn: Bill Siwicki
Email him: bsiwicki@himss.org
Healthcare IT News is a HIMSS Media publication.
WATCH NOW: How to launch a healthcare AI project, per the VA AI chief