Clustering Techniques for Reducing Inconsistency in Databases

Sergio Luján-Mora, Manuel Palomar
Proceedings 1st International Workshop on Databases, Documents, and Information Fusion (DBFusion 2001), p. 1-12: Otto-von-Guericke-Universität Magdeburg, Gommern (Germany), May 3-4 2001.
One of the main goals of databases is to provide consistent information. In this paper, we present an automatic method for reducing inconsistency found in existing databases, and thus, improving data quality. All the values that refer to a same term are clustered by measuring their degree of similarity. The clustered values can be assigned to a common value that, in principle, could substitute the original values. We evaluate different similarity measures for clustering. The method we propose gives good results with a considerably low error rate.