Skip to main content

As AI explodes, the need for synthetic data is more crucial than ever

With synthetic data, innovation doesn't have to come at the cost of privacy – synthetic data enables hospitals, health systems and others to unlock new capabilities while keeping patient trust intact, a healthcare analytics expert says.
By Bill Siwicki
Daniel Blumenthal of MDClone on synthetic data
Bunk (right) and his best friend Daniel Blumenthal, vice president of strategy at MDClone
Photo: Daniel Blumenthal

In a time when health systems are struggling to gain meaningful insights from data – and simultaneously aware that safeguarding patient privacy is essential – synthetic data offers a lot of potential.

Synthetic data refers to artificially generated data that replicates the statistical patterns and relationships found in real-world datasets, such as patient records, without containing any actual personal or identifiable information. It's created by learning from real data to extract population-level statistics, then generating an entirely new dataset that looks and behaves like the original.

However, crucially, there is no one-to-one link between individuals in the synthetic dataset and those in the source data. This means synthetic data can't be traced back to real people, setting it apart from privacy risks of traditional anonymization or de-identification techniques.

Access for a broader group of users

Synthetic data is widely used in healthcare for research, education, innovation and product development. Because it mimics real data while preserving privacy, it opens access to a broader group of users – including students, developers, researchers and entrepreneurs – who otherwise would be restricted by privacy regulations.

These users can safely explore healthcare trends, test hypotheses, develop AI models, improve care pathways and build new technologies. The applications span hospitals, pharmaceutical companies, academic research and government agencies. Essentially, anywhere healthcare data holds potential to improve outcomes, synthetic data enables secure and scalable discovery, said Daniel Blumenthal, vice president of strategy at MDClone, a healthcare analytics platform vendor.

"As healthcare data becomes increasingly valuable, especially in the era of AI and machine learning, the demand for safe and meaningful access to that data has skyrocketed," he stated. "Traditional privacy-preserving methods like de-identification or data masking often fall short.

"These traditional methods may either fail to fully protect against re-identification risks or strip away important details – like geographic location, dates or ages – that are critical for meaningful analysis," he continued. "These limitations not only compromise patient privacy but also reduce the usefulness of the data, leading organizations to either avoid sharing it or share a version so limited it becomes ineffective for research."

Synthetic data addresses these issues by offering a novel approach: It retains the analytical utility of real data while removing risk of identifying individual people, which makes it both safer and more useful, he added.

Maximizing privacy, preserving utility

"For example, during public health crises like the COVID-19 pandemic, being able to analyze time-specific or location-based trends was crucial – and synthetic data enabled that without compromising privacy," he recalled. "This dual capability of maximizing privacy and preserving utility is essential in today's data-driven healthcare ecosystem."

On another front, democratizing data access means making meaningful healthcare data available to a much broader audience, not just a select few insiders.

"Currently, access to patient-level data is heavily restricted to individuals within large organizations who can navigate complex approval processes and comply with regulatory oversight," Blumenthal explained. "This system excludes a vast majority of capable, motivated researchers, students, developers and startups who may have ideas, tools or hypotheses to test but simply can't access the data they need to act on them.

"Synthetic data removes many of those access barriers," he continued. "Because it doesn't contain real patient information, it often doesn't require institutional review board approval and can be shared more freely. This opens the door for a more diverse and inclusive group of people to contribute to healthcare innovation across academia, industry and government."

Whether it's a medical student testing an algorithm or a public health analyst exploring regional care gaps, democratized access to realistic, usable data empowers a broader community to participate in solving some of healthcare's biggest challenges, he added.

When large, high-quality datasets are critical

Furthermore, synthetic data is a catalyst for privacy-first innovation – especially in the AI-driven landscape where large, high-quality datasets are essential, Blumenthal said.

"The explosion of computational power and the rise of advanced machine learning models have made it possible to analyze and process vast amounts of data in ways that weren't feasible even a few years ago," he explained. "However, real patient data remains tightly protected, and rightly so, due to privacy concerns. This creates a tension: The technology is ready to transform healthcare, but the data needed to fuel it is out of reach.

"This is where synthetic data plays a transformative role," he continued. "It can be generated in large volumes while preserving privacy, enabling AI models to be trained, tested and validated without exposing sensitive information."

AI tools that once relied solely on published studies now can be built and refined using synthetic datasets that reflect real-world populations.

"This creates opportunities to accelerate breakthroughs in process automation, diagnostics, predictive analytics and personalized care," he concluded. "With synthetic data, innovation doesn't have to come at the cost of privacy. Synthetic data enables us to unlock new capabilities while keeping patient trust intact."

Follow Bill's health IT coverage on LinkedIn: Bill Siwicki
Email him: bsiwicki@himss.org
Healthcare IT News is a HIMSS Media publication.

WATCH NOW: Chief AI Officers require a deep understanding of the technologies and clinical ops