T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Please use this identifier to cite or link to this item:
|Title: ||Queries, Data, and Statistics: Pick Two|
|Authors: ||Mishra, Chaitanya|
|Advisor: ||Koudas, Nick|
|Department: ||Computer Science|
|Keywords: ||Database Systems|
|Issue Date: ||21-Apr-2010|
|Abstract: ||The query processor of a relational database system executes declarative queries on relational data using query evaluation plans. The cost of the query evaluation plan depends on various statistics defined by the query and data. These statistics include intermediate and base table sizes, and data
distributions on columns. In addition to being an important factor in query optimization, such statistics also influence various runtime properties of the query evaluation plan.
This thesis explores the interactions between queries, data, and statistics in the query processor of a relational database system. Specifically, we consider problems where any two of the three - queries, data, and statistics - are provided, with the
objective of instantiating the missing element in the triple such that the query, when executed on the data, satisfies the statistics on the associated subexpressions. We present multiple query processing problems that can be abstractly formulated in this manner.
The first contribution of this thesis is a monitoring framework for collecting and estimating statistics during query execution. We apply this framework to the problems of
monitoring the progress of query execution, and adaptively reoptimizing query execution plans. Our monitoring and adaptivity framework has a low overhead, while significantly reducing query execution times. This work demonstrates the feasibility and utility of overlaying statistics estimators on query evaluation plans.
Our next contribution is a framework for testing the performance of a query
processor by generating targeted test queries and databases. We present techniques for data-aware query generation, and query-aware data generation that satisfy test cases specifying statistical constraints. We formally analyze the hardness of the problems considered, and present systems that support best-effort semantics for targeted query and data generation.
The final contribution of this thesis is a set of techniques for designing queries for business intelligence applications that specify cardinality constraints on the result. We present an interactive query refinement framework that explicitly incorporates user feedback into query design, refining queries returning too many or few answers.
Each of these contributions is accompanied by a formal analysis of the problem, and a detailed experimental evaluation of an associated system.|
|Appears in Collections:||Doctoral|
Department of Computer Science - Doctoral theses
Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.