PUBLICACIONES
Ver todos los resumenes/See all abstracts
Ver todas las publicaciones (sin resumenes)/See all publications (without abstracts) Claves:
BL | Blog | CI | Congreso internacional / International conference | CL | Capítulo de libro / Book chapter | CN | Congreso nacional / National conference | II | Informe interno / Internal report | LI | Libro / Book | RV | Revista / Journal |
URL Documento / Document Presentación / Slides
Año/Year 2001:
Clave: CI Ref: IDEAS'2001 Sergio Luján-Mora, Manuel Palomar. Reducing Inconsistency in Integrating Data from Different Sources. Proceedings 2001 International Database Engineering and Applications Symposium (IDEAS 2001), p. 209-218: IEEE Computer Society, Grenoble (France), July 16-18 2001. https://doi.org/10.1109/IDEAS.2001.938087
One of the main problems in integrating databases into a common
repository is the possible inconsistency of the values stored in
them, i.e., the very same term may have different values, due to
misspelling, a permuted word order, spelling variants and so on.
In this paper, we present an automatic method for reducing
inconsistency found in existing databases, and thus, improving
data quality. All the values that refer to a same term are
clustered by measuring their degree of similarity. The clustered
values can be assigned to a common value that, in principle,
could be substituted for the original values. We evaluate four
different similarity measures for clustering with and without
expansion of abbreviations. The method we propose may work well in
practice but it is time-consuming. In order to reduce this
problem, we remove stop words for speeding up the clustering. | |
|