[Brown CS Talks] Brown CS Seminar: Zachary Ives in Lubrano on 4/5/2002 at noon

talks-admin@list.cs.brown.edu talks-admin@list.cs.brown.edu
Wed, 27 Mar 2002 16:36:17 -0500


			      CS Seminar
		  
		  The Department of Computer Science
			   BROWN UNIVERSITY

			      
			       presents

			     Zachary Ives

		       University of Washington

				 
		    Friday, April 5, 2002 at noon
	       Lubrano Conference Room (CIT 4th floor)
	       Refreshments will be served at 11:45 am

			       

	   Efficient Query Processing for Data Integration


			       Abstract

Today, virtually any organization or collaboration of size
(enterprise, academic department, coalition, research lab, etc.) has a
need to inspect and query its data to get a better understanding of
its internal processes or its domain of interest.  However, this data
is typically stored across a variety of heterogeneous data management
applications with different terminologies.  Data integration abstracts
these sources into a single virtual database that the user queries.
The technology enabling data integration has matured in recent years,
except in one key area: providing good performance when processing
queries.

There are two key challenges posed by data integration query
processing.  First, very little is known about the data sources, but
query processors rely on statistics about the data when choosing a
``plan'' to use in executing a query.  Second, data integration
applications need to process XML since it has become the standard
format for data interchange.  Current techniques for processing XML do
not suffice for data integration because they do not produce initial
answers quickly enough, particularly for data being streamed across a
network.

To address the first problem, I have developed convergent query
processing, which establishes a feedback loop between query execution
and optimization: the system monitors actual query plan performance
and uses this new knowledge to re-estimate the cost of alternative
query plans.  At any point, execution can be stopped and a more
promising plan can be started in mid-stream.  Convergent query
processing not only addresses the problem of limited knowledge in data
integration, but it can also benefit traditional databases.  To
address the second data integration challenge -- returning initial
answers quickly for XML queries -- I have developed an XML query
processing architecture that incrementally provides results as data is
read across the network.  Combined with convergent query processing,
this XML architecture provides good performance for both initial and
final answers.


		     Host:  Professor Steve Reiss