Engineering student researches machine learning for language processing

April 12, 2018

Engineering student researches machine learning for language processing

Using computers to comb the vast sea of biomedical literature could be key to identifying relationships among concepts

As part of Research Weeks (April 6–27) we are highlighting the work of six undergraduates whose work was made possible by VCU’s Undergraduate Research Opportunities Program, Global Education Office, Division for Inclusive Excellence and guidance from faculty members.

Research Weeks takes place on both campuses and features a wide variety of projects in multiple disciplines.

See more stories by clicking on links in the “Related stories” section or learn more about the lineup of events for this year’s Research Weeks.

Clint Cuffy has always been interested in machine learning — the area of computer science in which computers use data to learn how to perform tasks, rather than being specifically programmed to do those tasks. But he lacked the opportunity to delve into the subject.

“It seemed like a daunting task without a finite starting point, especially when it comes to neural networks,” said the senior, who is majoring in computer science at the Virginia Commonwealth University School of Engineering. “Learning the inner workings and how to implement algorithmic approaches of modeling how the human mind works seemed far-fetched at best.”

Then Bridget McInnes, Ph.D., assistant professor in the Department of Computer Science, approached him with an opportunity to research machine learning with regard to natural language processing, and the rest is history, Cuffy said.

“My research project [‘Identifying relations in biomedical text for literature-based discovery’] entails using a neural network, an algorithm modeled after how the mind functions in regards to a-cyclical connected neurons, to learn unique relationships between concepts through predications and represent them in semantic space as concept vectors,” Cuffy said. “That is to say, considering an example such as ‘aspirin treats headaches,’ we can see the relationships between all three of these terms. ‘Aspirin’ and ‘headaches’ are related through the word ‘treats.’ Semantic similarity and relatedness defines how words can be similar to each other, such as ‘liver-organ,’ or related, such as ‘aspirin-headache,’ through predicate relationships. Using the previous examples, ‘liver is an organ,’ since liver is a more specific subset of an organ they are similar, but aspirin is not a headache so they are not similar. Since aspirin is used to treat headaches, they are related by the ‘treats’ predication.

“With this knowledge, we [use] a neural network that learns how to define representations of concepts in semantic space, in terms of similarity and relatedness, by the hypothesis words that can be defined by the context which surrounds it.”

Biomedical research publications are being published at an astounding rate, Cuffy noted. These publications could hold keys to potentially important relationships in literature-based discovery, but keeping up by reading a single document at a time and identifying possible relationships poses a daunting task, he said.

However, researchers have consequently found relationships among concepts through accidental or investigative reading of these publications. Attempts have been made to automate the process, but in order to identify meaningful relationships that are presently unknown, Cuffy and McInnes are building a foundation by identifying those that already exist within semantic space. They are training a neural network using data from the National Institutes of Health — which contains a Semantic Predication Database with more than 91.6 million examples of predication triplets — to generate term vectors that define the concepts and predicates with high levels of accuracy.

“The significance to this could be the key to unlocking more meaningful relationships,” Cuffy said. “Rather than investigative or accidental relationships being discovered, we could potentially automate this process.”

Providing undergraduate students the opportunity to conduct research is an important part of their education, McInnes said. “Experiential learning initiatives have been shown to aid in the development of undergraduates’ professional habits and promote scientific and critical thinking.” Cuffy said the project has further fueled his drive to see the limits of machine learning and its practicality. “Performing research at the undergraduate level is a blessing in disguise,” Cuffy said. “You find yourself exposed to information, briefly discussed or taught at the classroom level, taken steps further into practical application. … Research can be a very challenging, but rewarding, experience. Not only are you increasing your knowledge-set and learning invaluable information, but there is always the possibility of having a positive impact on the lives of others through your research.”

VCU news