Algorithms and open-source machine-learning tools are as good as, or even better than, human reviewers in detecting cancer cases using information from free-text pathology reports, according to a new study from the Regenstrief Institute and Indiana University School of Informatics and Computing at Indiana University at Purdue. Further, the computerized approach also was faster and less resource-intensive.
Researchers sampled 7,000 free-text pathology reports from more than 30 hospitals that participate in the Indiana Health Information Exchange. The researchers used open-source tools, classification algorithms, and varying feature selection approaches to predict if a report was positive or negative for cancer. The results indicated that a fully automated review yielded results similar or better than those of trained human reviewers, saving both time and money, Indiana University said.
Every state in the United States requires cancer cases to be reported to statewide cancer registries for disease tracking, identification of at-risk populations, and recognition of unusual trends or clusters. Typically, however, healthcare providers with little time on their hands submit cancer reports to harried public health departments months into the course of a patient’s treatment, rather than at the time of initial diagnosis, Indiana University said.
As a result, the information can be difficult for health officials to interpret, which further delays health department action when action is needed, the university added. In their study, the Regenstrief Institute and Indiana University researchers have demonstrated that machine learning can greatly facilitate this process by automatically extracting crucial meaning from plain text, also known as free-text, pathology reports, and using the information and meaning for decision-making.
“We think it’s no longer necessary for humans to spend time reviewing text reports to determine if cancer is present or not,” said study senior author Shaun Grannis, MD, interim director of the Regenstrief Center of Biomedical Informatics. “We have come to the point in time that technology can handle this. A human’s time is better spent helping other humans by providing them with better clinical care.”
Much of the work in informatics during the next few years will be focused on how providers can benefit from machine learning and artificial intelligence, Grannis added.
“Everything – physician practices, health systems, health information exchanges, insurers, as well as public health departments – are awash in oceans of data,” he said.. “How can we hope to make sense of this deluge of data? Humans can’t do it – but computers can.”
The study,“Towards Better Public Health Reporting Using Existing Off the Shelf Approaches: A Comparison of Alternative Cancer Detection Approaches Using Plaintext Medical Data and Non-dictionary Based Feature Selection,” was published in the April 2016 issue of the Journal of Biomedical Informatics. The study was conducted with support from the Centers for Disease Control and Prevention.