Skip to main content

Seminar: Semantic-Based Information Extraction of Biomedical Definitions


Seminar Presentation

Saeed Hassanpour

Saeed Hassanpour, Ph.D.
Stanford Center for Biomedical Informatics Research


Location: HSEB 4100B
Date: Mar. 8, 2012
Time: 4:15 pm



Abstract

It is well known that the volume of biomedical literature is growing exponentially and that scientists are being overwhelmed when they must sift through the scope and diversity of this unstructured knowledge to find relevant information. Prior work on addressing this problem has focused on methods to search for relevant publications and to identify relevant parts of the publications. There has been much less research on methods that directly extract computer-interpretable knowledge from biomedical literature. To tackle this challenge, I present a novel method to support the acquisition of structured knowledge from unstructured text. I have applied my method to identifying rule-based definitions of phenotypes in autism. Because background knowledge of a complex and diverse medical condition like autism is critical to undertaking information extraction, I have developed a novel semantic-based approach. Specifically, I use existing background knowledge to incorporate domain relevant semantics, such as semantic similarity and rule structure, into a method for finding which publications and parts of texts are most relevant to phenotype definitions and for identifying which rule or rule format correctly captures a phenotype definition. In the autism domain, my evaluation shows that incorporating structured domain knowledge into information extraction improves the accuracy and relevance of results compared to alternative term-based approaches. My method can help scientists to rapidly formalize the complexity of domain knowledge that is emerging in published research findings. It is also widely applicable to other information extraction challenges where there is a need to accurately capture computer-interpretable definitions, constraints, and policies that are specified as text.



Bio

Saeed Hassanpour is currently a PhD student in Electrical Engineering with a minor in Biomedical Informatics at Stanford University. He works as a Research Assistant at the Stanford Center for Biomedical Informatics Research. His research aims to tackle the challenge of acquiring formal knowledge for clinical and health informatics applications. His work on methods for facilitating rule acquisition from domain experts and from online text has received multiple awards, including a Best Paper Award and a Best System Demonstration Award. Prior to Stanford he received a Master of Math in Computer Science from University of Waterloo, Waterloo, Canada.