Permutation Based Algorithm Improved by Classes for Similarity Searching

Karina Figueroa, Antonio Camarena-Ibarrola, Luis Valero

Abstract


Similarity searching is the most important task in multimedia databases, It consists in retrieving the most similar elements to a given query from a database, knowing that an element identical to the query would not be found. Dissimilarity between objects is measured with a distance function (usually expensive to compute), this allows approaching this problem with a metric space. Many algorithms have been designed to address this problem, in particular, the Permutation Based index has shown an unbeatable performance. This technique uses reference objects to determine a string for each element in the database that is a permutation of the same string. However, Huge databases and the memory required for these indexes make this problem a real challenge. In this paper, we present an improvement to the first approach where classes of reference objects were used instead of single references. In this paper, a new way to choose these classes is proposed and a new way to evaluate similarity between permutations. Our experiments show that we can avoid distance evaluations up to 90% with respect to the original technique, and up to 80% to the first approach.

Keywords


Similarity searching, metric spaces, pattern recognition, nearest neighbor

Full Text: PDF