Rodney D. Nielsen
Research Scientist
Research Scientist
Assistant Professor Adjunct
Boulder Language Technologies
Institute of Cognitive Science
Department of Computer Science
Rodney Nielsen
University of Colorado at Boulder
University of Colorado at Boulder
Research Interests

My research interests include Machine Learning (ML), Natural Language Processing (NLP), cognitive science, and the application of these fields to clinical informatics, health and wellbeing companion robots, educational technology, and end-user programming.

Basic Research in NLP and ML

The advancement of Natural Language Processing (NLP) and Machine Learning (ML) is central to my research. I am interested in both ML theory and application. My past research includes methods to improve the predictions of concept or class probability estimates and I am furthering this research to make advances in semi-supervised and active learning from large unlabeled corpora. I am particularly interested in self-training and co-training techniques.

One of the key open questions in many applications of ML, which is particularly true of NLP applications, is how to learn effectively from the vast quantities of unlabeled data available from high bandwidth input streams and from massive data sources, such as the web. This consists of two important broad research questions, which I am investigating, the first addressing learning from massive datasets ("big data") and the second addressing learning from unlabeled data.

A few other advances in NLP and ML algorithms I am pursuing are a new unsupervised soft-clustering algorithm, user-assisted learning, and early-stage ideas for learning natural language patterns and fuzzy rules built up from statistical learning. I believe all of these ideas have the potential to facilitate significant advancements in the NLP required for spoken dialogue based companionable robots, clinical informatics, educational technology, end-user development and other applications.

My primary research focus is on computational semantics models intended to facilitate machine understanding of text and spoken dialogue. This includes generating semantic representations (semantic facets, concept relations, predicate argument structure, discourse relations, etc.), extracting lexical and conceptual relations from distributional statistics of large corpora, and recognizing presupposition, implicature and entailment.

In the following sections, I describe some of the applied research where collaborators and I aim to utilize this basic research.

Spoken Dialogue Health and Wellbeing Companion Robots

The number of people over 65 in the U.S. will more than double in roughly the first quarter of this century. Many of these elderly would prefer to maintain their independence and remain in their homes. Additionally, many suffer from depression. Collaborators and I are researching means of supporting these seniors via emotive spoken-dialogue companion robots (Companionbots; Nielsen PI, NSF $1.96M total 2011-2015; CU, DU, UCD Anschutz Medical Campus, and Boulder Language Technologies). The focus of the research is on dialoguing, especially generating and answering questions, in the context of providing education and training related to depression, monitoring participants for signs of physical, mental or emotional deterioration, and being a companion. NLP will capitalize on multimodal input and output, be heavily context dependent and tightly integrated with a user model and history. ML will emphasize co-training on multimodal input and user-assisted semi-supervised learning from massive data sets and data streams. Future work will include massive-scale data mining over the information collected by the Companionbots.

Clinical Question Answering

In work with Harvard Medical School and Mayo Clinic (MiPACQ; Savova PI, NIH ARRA $1M 20092011, co-investigator), we are researching the use of statistical computational semantics in clinical question answering (CQA) and have achieved state-of-the-art results. Specifically, we have annotated a large corpus of clinical notes, biomedical encyclopedic text, and clinical questions with syntactic and semantic information such as the semantic relations between predicates and their arguments, unified medical language system (UMLS) entities and relations, expected answer classifications, etc. and trained classifiers to automatically parse and annotate questions and text with this information. Then given a question, we use information retrieval tools to find relevant medical articles or clinical notes that might contain the answer, automatically annotate the question and potential answers, and extract syntactic and semantic features from these annotations. Finally, we use a machine-learned re-ranker to identify the paragraph-level results most likely to answer the question. Future work will aim to synthesize results from multiple medical resources.

Clinical Data Mining

Within this same framework described above, we are also researching NLP and ML techniques for research cohort identification, identifying patients appropriate to participate in a given clinical trial, based largely on information extracted from unstructured text in the notes of electronic medical records. Future work will investigate mining electronic health records to determine the efficacy of clinical treatments and to identify patterns of interaction and care that lead to greater positive (or negative) outcomes.

Classroom Response Technology

Collaborators and I are conducting research to help instructors assess student knowledge and skills in real-time (Comprehension SEEDING; Nielsen PI, IES $1.83M 2011-2014 with ASU and UCD). Students submit free text responses to instructors' open-ended questions via mobile devices to an NLP system that clusters the answers and provides the instructor feedback on the types of misconceptions and their frequency, among other things. Unlike clicker technology, students must articulate their understanding of a concept, which has been shown by numerous cognitive science researchers to be a key to deep learning. This research will benefit from aspects of my Ph.D. work, which was the first research to successfully assess elementary students' one- to two-sentence constructed response answers.

Intelligent Tutoring Systems Response Assessment

In the context of a known reference answer to a tutor's question, I extract a knowledge representation of the fine-grained facets of the reference answer and classify each according to whether you can infer from a student response that they understood the facet, contradicted it, left it unaddressed, or expressed something related that is perhaps a misconception. The goal of this fine-grained analysis, classifying more precisely the student's apparent understanding of detailed facets, is to facilitate improved pedagogical dialogue and eventually Socratic tutoring. To that end, I am also researching automatic question generation and question answering.

To support this work, I had a corpus annotated to indicate elementary school students' apparent understanding of a broad spectrum of science concepts. This corpus, comprised of 15,357 student responses and 142,451 facet annotations for questions from 16 different science areas, can be downloaded from my Resources page.

Cognitive Science

I have a dual Ph.D. in Computer Science and Cognitive Science and have studied psycholinguists and human learning theory. I incorporate findings from these areas throughout my research in computational semantics, educational technologies, and machine learning.

End-User Software Development

I am also applying my ML research and software engineering experience to end-user software development.


In summary, my research includes developing new NLP and ML algorithms to facilitate learning from massive unlabeled data sources and computational semantics algorithms with applications to health and wellbeing companion robots, health informatics, education, and end-user development.