Pyramid Analysis: Quantifying Human Recall of Text for the Design and
Assessment of Automatic Text Processing Applications
Dr. Rebecca Passonneau
Abstract
Different people, or the same person on different occasions, will have
different understandings of the same text. What then, should an automated
system produce as its understanding of one or more texts on the same
topic? Providing a precise answer to this question is important for many
kinds of Natural Language Processing (NLP) applications, including
automated summarization of text. I will give an overview of an approach I
refer to as Pyramid analysis, which pertains to the design and assessment
of computational applications. It is a new method for analyzing what
humans recall from text, distinguished by two critical features: the
semantics "emerges" from a sample of human-generated summaries, and the
semantic units receive a weighting that reflects their perceived
importance. I will compare this method with other fully automated
evaluation methods, and discuss our initial experiments at automating
pyramid analysis.
Biography
Rebecca Passonneau, currently a consulting Research Scientist with
Columbia University's Natural Language Processing (NLP) group, received
her Ph.D. in linguistics from the University of Chicago in 1985. She has
two decades of research experience in computational linguistics. Her
methodological focus is on the integration of observational and
experimental data collection methods with the engineering of Natural
Language Processing technology. She has looked at monologic and dialogic
data, and spoken and writen discourse (primarily in English). The
computational applications she has helped develop include natural language
understanding (NLU) and generation (NLG) systems, and she has played a
significant role in fostering rigorous evaluation methods for NLU, NLG,
dialog systems, and more specialized applications, such as extracting
subject access metadata for cataloging digital image collections. Areas
of research where she has made significant contributions include discourse
reference, the semantics and pragmatics of tense and aspect, theories of
human attentional state in online processing, and models of discourse
structure and cohesion.