Skip to main content

Data variety bigger hurdle than volume

'Only by addressing the challenge of utilizing diverse types of data will we be able to unlock the enormous potential of analytics.'
By Mike Miliard , Executive Editor

Much has been made of the exploding volume of patient data, and the challenges and opportunities that poses for healthcare. But a new analysis finds that it's actually the variety of those data types that's truly giving researchers headaches.

[See also: Data analytics poised for big growth]

Electronic health records, connected mobile devices, genomic sequencers and sensors are churning out massive troves of data that grow with each passing day. But the new survey from computational database company Paradigm4, finds that it's the sheer diversity of all that data, rather than the size of the data sets, that's a bigger challenge to data scientists.

That's causing researchers to "leave data on the table," according to the report, which finds that 71 percent of data scientists polled say big data has made their analytics tasks more difficult.

[See also: Demand for big data gets bigger]

Still, size matters. More than one-third (36 percent) of data scientists say it takes too long to arrive at insights because the data is too big to move to their analytics software.

But variety of data sources complicates thing. Many researchers report omitting data from their analyses as they try to figure out how to incorporate new sources such as time-stamped sensor, location, image and behavioral data as well as network data.

"The increasing variety of data sources is forcing data scientists into shortcuts that leave data and money on the table," said Marilyn Matz, CEO of Paradigm4, in a press statement. "The focus on the volume of data hides the real challenge of data analytics today. Only by addressing the challenge of utilizing diverse types of data will we be able to unlock the enormous potential of analytics."

Other findings from the survey, which polled more than 100 data scientists:

  • 91 percent said they're using complex analytics on their big data now or plan to within the next two years.
  • 49 percent said they're finding it more difficult to fit their data into relational database tables
  • 39 percent said their job had become more stressful with the growth of big data

Incorporating the diverse data types into analytical workflows is a big pain point for data scientists using traditional relational database software, the report shows.

Indeed, many researchers report having to move large volumes of data from existing data stores to dedicated mathematical and statistical computing software -- a "time-consuming and coding-intensive step adds no analytical value" and slows down efficiency, according to Paradigm4.

"Precision medicine providers are gaining a more refined understanding of what works for whom by integrating molecular data with clinical, behavioral, electronic health records and environmental data," according to the study. "But the ability to use diverse data types poses a serious challenge."

Access the report here.