Generation-Heavy Hybrid Machine Translation (GHMT) is an asymmetrical
hybrid approach that addresses the issue of Machine Translation (MT)
resource poverty in source-poor/target-rich language pairs by exploiting
available symbolic and statistical target-language (TL) resources. This
talk presents a specific implementation of this approach focusing on
approximating Interlingual (IL) MT without the use of IL resources for
the source language (SL). The expected SL resources include a syntactic
parser and a simple translation dictionary. Expensive parallel resources,
such as transfer rules, complex interlingual lexicons, or even bitexts
are not used. Rich TL symbolic resources such as word lexical semantics,
categorial variations and subcategorization frames are used to
overgenerate multiple structural variations from a TL-glossed syntactic
dependency representation of SL sentences. This SL-independent symbolic
overgeneration accounts for possible translation divergences, cases where
the underlying concept or "gist" of a sentence is distributed differently
in two languages. The overgeneration is constrained by multiple
statistical TL models including surface n-grams and structural n-grams.
The first implementation of this approach focused on Spanish-English MT.
An evaluation of this system will be presented together with issues with
ongoing work on retargeting to Chinese and Arabic.
Biography
Nizar Habash received his PhD in 2003 from the Computer Science
Department, University of Maryland College Park. He is currently a
Postdoctoral researcher at the Center for Computational Learning Systems
in Columbia University. His research focuses on Machine Translation,
Natural Language Generation, and Lexical Semantics. He is currently
working on computational modeling of Arabic dialects and Arabic-English
Machine Translation.