[Brown CS Talks] Brown CS Seminar: Zachary Ives in Lubrano on 4/5/2002 at noon
talks-admin@list.cs.brown.edu
talks-admin@list.cs.brown.edu
Wed, 27 Mar 2002 16:36:17 -0500
CS Seminar
The Department of Computer Science
BROWN UNIVERSITY
presents
Zachary Ives
University of Washington
Friday, April 5, 2002 at noon
Lubrano Conference Room (CIT 4th floor)
Refreshments will be served at 11:45 am
Efficient Query Processing for Data Integration
Abstract
Today, virtually any organization or collaboration of size
(enterprise, academic department, coalition, research lab, etc.) has a
need to inspect and query its data to get a better understanding of
its internal processes or its domain of interest. However, this data
is typically stored across a variety of heterogeneous data management
applications with different terminologies. Data integration abstracts
these sources into a single virtual database that the user queries.
The technology enabling data integration has matured in recent years,
except in one key area: providing good performance when processing
queries.
There are two key challenges posed by data integration query
processing. First, very little is known about the data sources, but
query processors rely on statistics about the data when choosing a
``plan'' to use in executing a query. Second, data integration
applications need to process XML since it has become the standard
format for data interchange. Current techniques for processing XML do
not suffice for data integration because they do not produce initial
answers quickly enough, particularly for data being streamed across a
network.
To address the first problem, I have developed convergent query
processing, which establishes a feedback loop between query execution
and optimization: the system monitors actual query plan performance
and uses this new knowledge to re-estimate the cost of alternative
query plans. At any point, execution can be stopped and a more
promising plan can be started in mid-stream. Convergent query
processing not only addresses the problem of limited knowledge in data
integration, but it can also benefit traditional databases. To
address the second data integration challenge -- returning initial
answers quickly for XML queries -- I have developed an XML query
processing architecture that incrementally provides results as data is
read across the network. Combined with convergent query processing,
this XML architecture provides good performance for both initial and
final answers.
Host: Professor Steve Reiss