Computer Sciences Seminar
Tuesday, March 19
12:30 PM, NAC 8/206

Querying XML from Mixed and Redundant Storage

Alin Deutsch
University of Pennsylvania

Abstract
XML is widely accepted as the standard for data exchange between businesses on the Internet. However, most corporations publish only selected portions of their proprietary business data as XML documents, and even then only virtually, that is by exposing a schema (interface) against which XML queries can be asked. In order to be answered, such XML queries must then be *reformulated* as queries on the actual proprietary data.

Our work concentrates on this query reformulation problem. We solve the problem in a very general setting that allows mixed (XML, relational, LDAP, etc.) storage for the proprietary data and takes advantage of redundancies (materialized views, indexes and caches) that can enhance performance. Moreover, we are able to give a theoretical guarantee that our algorithm will always find an optimal reformulation if one exists. We discuss the MARS system that implements this technique and we present a suite of experiments that validate it.

Our general approach to query reformulation is also applicable in contexts other than XML publishing, such as information integration, evolution of schema correspondences, distributed data caching, adaptive distributed query optimization, and data security.

Bio
Alin Deutsch received his B.Sc. in Computer Engineering form the Polytechnic Institute Bucharest, Romania, his M.Sc. in Computer Science from the Technical University of Darmstadt, Germany, and will earn his PhD. in Computer Science from the Universtiy of Pennsylvania in Summer 2002.