PUBLICACIONES

Ver todos los resumenes/See all abstracts

Ver todas las publicaciones (sin resumenes)/See all publications (without abstracts)

Claves:


BLBlog
CICongreso internacional / International conference
CLCapítulo de libro / Book chapter
CNCongreso nacional / National conference
IIInforme interno / Internal report
LILibro / Book
RVRevista / Journal

URL     Documento / Document     Presentación / Slides


Año/Year 2001:

Clave: CI  Ref: IDEAS'2001
Sergio Luján-Mora, Manuel Palomar. Reducing Inconsistency in Integrating Data from Different Sources. Proceedings 2001 International Database Engineering and Applications Symposium (IDEAS 2001), p. 209-218: IEEE Computer Society, Grenoble (France), July 16-18 2001. https://doi.org/10.1109/IDEAS.2001.938087

One of the main problems in integrating databases into a common repository is the possible inconsistency of the values stored in them, i.e., the very same term may have different values, due to misspelling, a permuted word order, spelling variants and so on. In this paper, we present an automatic method for reducing inconsistency found in existing databases, and thus, improving data quality. All the values that refer to a same term are clustered by measuring their degree of similarity. The clustered values can be assigned to a common value that, in principle, could be substituted for the original values. We evaluate four different similarity measures for clustering with and without expansion of abbreviations. The method we propose may work well in practice but it is time-consuming. In order to reduce this problem, we remove stop words for speeding up the clustering.
   



Ver todos los resumenes/See all abstracts

Ver todas las publicaciones (sin resumenes)/See all publications (without abstracts)



Página mantenida por Sergio Luján Mora
Última actualización: 19-Dic-2001 
página principalenviar correo