Skip to main content

Epic's AI validation 'cookbook' helps health systems review models' performance

While the open-source Seismometer tool does not provide ROI metrics, it can surface insights on an algorithm's quality and utility, say early users at Michigan Medicine.
By Andrea Fox , Senior Editor
doctor speaks to a hospital patient in bed
Photo: elCasanelles/Getty Images

There is often a disconnect between the intended functionality of an artificial intelligence model and its actual functionality once it's integrated into healthcare workflows.

Epic has developed an open-source tool that helps health systems evaluate their AI models' performance, facilitating side-by-side comparisons to help providers make informed usage decisions based on their local data and workflows.

Called the Seismometer, the technology can be used locally to ingest an organization's data and answer many questions about a given model – such as whether it improves patient outcomes, results in faster treatment or is widely accepted by users. It can also compare AI models to each other.

Over the past year since the tool was first introduced, Epic and its collaborators have worked to refine the free validation approach, which was designed for healthcare organizations that often lack resources to properly validate their AI and machine learning models.

When is AI 'worth the squeeze'

Epic worked with the Health AI Partnership and other data scientists at the University of Wisconsin and elsewhere to test the tool during development.

Brian Patterson, UW Health's medical informatics director for predictive analytics and AI, said the organization recently decided to start using it as more standard practice going forward.

"It's more full-featured than when they rolled it out, in terms of some of the ways that it can calculate uncertainty and some of the ways that it calculates the ability to do subgroup analysis, which are really important for ensuring fairness," he said in an interview.

A recent review of large language models used in healthcare and medical applications outlined more than 50 areas of application. Finding out whether an AI model is worth the implementation lift and cost – a basic chatbot may cost $20,000, but advanced AI systems could cost an organization millions of dollars – is critical for healthcare organizations.

Michigan Medicine has been an early adopter of Epic's Seismometer, implementing it earlier this year and using it to validate use of the Epic Sepsis version 2 model and identify different use thresholds for different groups, or cohorts, within its organization.

Dr. Michael Burns, Michigan Medicine's associate chief medical informatics officer for AI, called the tool something of a "game-changer" for its ability to automate and visualize painstaking AI validation with its intuitive GUI. 

So far, the Seismometer has enabled Michigan Medicine to make ad hoc sepsis model adjustments that ease certain workflow challenges in reacting to alerts, said Burns' colleague, Sean Meyer, Michigan Medicine's data science and AI engineering lead.

The AI validation tool aided in identifying appropriate sepsis warning thresholds for different clinical groups and workflows, like the emergency department and intensive care unit.

Many of the health system's hospital departments, including general care floors, have patients at risk for sepsis, but each has a different workflow for responding to a trigger or notification, Meyer told Healthcare IT News.

"Looking at specific departments and seeing what the performance is, and in some cases, I think the question is, can we adopt this in other departments?" he said.

The tool's graphical user interface automates performance plotting and enables filtering, significantly reducing the need for manual coding.

It can pull the hospital's historical data, including predictions for at-risk patients and various events – death, ICU transfer, antibiotic administration, etc. – instead of "having to write more Python code, and then validate the Python code," Meyer explained.

The process is more efficient and accessible for communicating with leadership and stakeholders, such as the health system's sepsis committee.

With the Seismometer, Meyer and his team can also make "valid comparisons or equal comparisons between models," such as a custom AI model and a vendor's model, without duplicative efforts and with assurance.

"A lot of times what we have is a custom model that a researcher has built, and we have an Epic model or another vendor model that we want to compare the results of, and this kind of a tool makes that effort a lot easier," he said. "We don't have two different groups writing an evaluation."

Meanwhile, Patterson at UW Health noted that the tool equips technologists with AI usage information – positive and negative predictive values – which is broken down by subgroups, such as age, to give to decision-makers.

"One of the things that we really like about the tool that we've done is when you look at a classifier, it spits out how well the classifier works and all the statistics that we're used to seeing in terms of areas under the curve sensitivity," said Patterson. "These are sort of the currency of these tools that you use to talk to other folks in your organization about whether to turn it on or not."

One challenge Meyer identified with Michigan Medicine's usage of the OS validation tool was seeing what it does "behind the scenes," especially in the handling of data for patients with incomplete prediction or event data.

A spokesperson for Epic told us this week that the company has since made that visible so users can better understand the technical mapping from their own analyses, compared to the tool's AI trust and assurance suite.

"During the initial collaboration period in spring 2025, we made an enhancement that added more clarity around the performance output," the spokesperson said.

Evaluating quality, not ROI

Burns, from Michigan Medicine, said there are no immediate return on investment metrics and the tool does not necessarily spotlight immediate opportunities for cost reductions. But he stressed the importance of evaluating quality, such as improved patient outcomes and improved workflow integrations.

"Are we actually treating the people with [sepsis] better, faster or more efficiently? Sepsis is nebulous. Are we actually getting less sepsis? Well, people come to the hospital because they are sick, right? So some of that cadence will stay at a certain current level," he explained.

"I don't think that ROI is ever really going to be met."

Both Burns and Meyer expressed optimism for the future. Meyer said he believes the Seismometer will standardize Michigan Medicine's evaluation process, making it less opaque and more accessible, and Burns said it will help the health system understand why certain AI facilitates faster adoption across different facilities. While they are still using the AI validation tool locally, in the future, it could be server-based.

Overall, despite an initial learning curve and some minor challenges along the way, their close collaboration with Epic's research and development team has enabled hands-on development opportunities and they've made recommendations for new tool features, they said.

Justifying AI investments

For a more financially challenged Federally Qualified Health Center that wants to implement AI into clinical workflows, getting Epic's Seismometer tool up and running can be more of a journey.

Community-University Healthcare Center, an FQHC in South Minneapolis, is navigating the complexities of AI implementation, despite a lack of in-house technical expertise.

As the center works to set up and operate tools like the Seismometer, its affiliation with the University of Minnesota has given it access to some resources not typically available to smaller centers, said Eric Maurer, the center's chief innovation and strategy officer.

Maurer explained that the "safety-net story" is learning how to "safely and equitably" use AI to improve care delivery. With the pro bono help, the FQHC is working with OCHIN, the national network helping community health centers procure IT and services, and Epic to integrate and validate its no-show predictive analytics model.

The first goal is to operationalize predictive analytics for patient outreach and reduce communication burdens on staff. While Epic is helping with software setup, the FQHC relies on OCHIN to validate the model nationally and deliver the risk scores to their instance.

"We need the risk scores from OCHIN to appear in our instance," Maurer explained. "It's not something we ourselves can turn on."

Once his teams receive the model performance data for the no-show predictor, he added, Maurer believes the health center will be able to assess performance at their organization well enough to justify investing limited resources.

In the future, they could potentially use Epic's OS AI validation tool to better understand risks associated with specific conditions such as asthma and hypertension, monitor for inequities and health disparities, and ensure fairness in AI.

"That's the path," said Maurer. "Can we get some experience with one analytic so that we can start to understand how to implement and operationalize successfully?"

Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.

More Regional News

Doctors looking at medical images on a monitor
Top trends shaping the future of healthcare surveillance and security
By |