Pyramid Analysis: Quantifying Human Recall of Text for the Design and Assessment of Automatic Text Processing Applications

Dr. Rebecca Passonneau

Abstract

Different people, or the same person on different occasions, will have different understandings of the same text. What then, should an automated system produce as its understanding of one or more texts on the same topic? Providing a precise answer to this question is important for many kinds of Natural Language Processing (NLP) applications, including automated summarization of text. I will give an overview of an approach I refer to as Pyramid analysis, which pertains to the design and assessment of computational applications. It is a new method for analyzing what humans recall from text, distinguished by two critical features: the semantics "emerges" from a sample of human-generated summaries, and the semantic units receive a weighting that reflects their perceived importance. I will compare this method with other fully automated evaluation methods, and discuss our initial experiments at automating pyramid analysis.

Biography

Rebecca Passonneau, currently a consulting Research Scientist with Columbia University's Natural Language Processing (NLP) group, received her Ph.D. in linguistics from the University of Chicago in 1985. She has two decades of research experience in computational linguistics. Her methodological focus is on the integration of observational and experimental data collection methods with the engineering of Natural Language Processing technology. She has looked at monologic and dialogic data, and spoken and writen discourse (primarily in English). The computational applications she has helped develop include natural language understanding (NLU) and generation (NLG) systems, and she has played a significant role in fostering rigorous evaluation methods for NLU, NLG, dialog systems, and more specialized applications, such as extracting subject access metadata for cataloging digital image collections. Areas of research where she has made significant contributions include discourse reference, the semantics and pragmatics of tense and aspect, theories of human attentional state in online processing, and models of discourse structure and cohesion.