test Browse by Author Names Browse by Titles of Works Browse by Subjects of Works Browse by Issue Dates of Works
       

Advanced Search
Home   
 
Browse   
Communities
& Collections
  
Issue Date   
Author   
Title   
Subject   
 
Sign on to:   
Receive email
updates
  
My Account
authorized users
  
Edit Profile   
 
Help   
About T-Space   

T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Doctoral >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/33955

Title: Data Quality Through Active Constraint Discovery and Maintenance
Authors: Chiang, Fei Yen
Advisor: Miller, Renee J.
Department: Computer Science
Keywords: data management
data quality
Issue Date: 10-Dec-2012
Abstract: Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning. In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider.
URI: http://hdl.handle.net/1807/33955
Appears in Collections:Doctoral

Files in This Item:

File Description SizeFormat
Chiang_Fei_Y_201209_PhD_thesis.pdf1.54 MBAdobe PDF
View/Open

Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

uoft