Noise Detection and Learning Based on Current Information

Damaris Pascual-González, Fernando Daniel Vázquez Mesa, Jorge Luis Toro Pozo


Methods for noise cleaning have great significance in classification tasks and in situations when it is necessary to carry out a semi-supervised learning due to importance of having well-labeled samples (prototypes) for classification of the new patterns. In this work, we present a new algorithm for detecting noise in data streams that takes into account changes in concepts over time (concept drift). The algorithm is based on the neighborhood criteria and its application uses the construction of a training set. In our experiments we used both synthetic and real databases, the latter were taken from UCI repository. The results support our proposal of noise detection in data streams and classification processes.


Cleansing noise; data streams; semi-supervised learning; concept drift.

Full Text: PDF (Spanish)