Clustering of Similar Values, in Spanish, for the Improvement of Search Systems

Sergio Luján-Mora, Manuel Palomar
IBERAMIA-SBIA 2000 Open Discussion Track Proceedings, p. 217-226, Atibaia - Sao Paulo (Brasil), November 19-22 2000.
(Iberamia'2000) Congreso internacional / International conference


The ability to correctly access electronically stored information is becoming increasingly important as stored information itself keeps growing continuously. One of the problems that face search systems is the inconsistency found among the stored values: i.e., the very same term may have different values, due to misspelling, a permuted word order, spelling variants and so on. The clustering of the values that refer to a given term solves this problem by replacing these clustered values with one single value. In this paper, we present a clustering method that allows us to reduce on the existing inconsistencies in databases and, thus, improve on the performance of both search and information retrieval systems. The method we propose here gives good results with a considerably low error rate.