Handling the Multi-Class Imbalance Problem using ECOC
Abstract
Imbalanced training sample means that one class is represented by a large number of examples while the other is represented by only a few. This problem may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. The majority of the studies in this area are oriented, mainly, to resolve problems with two classes. However, many real problems are represented by multiple classes, where it is more difficult to discriminate between them. The success of the Mixture of Experts (ME) strategy is based on the criterion of “divide and win”. The general process divides the global problem into smaller fragments which will be studied separately. In this way, the general model has few influences of the individual difficulties (of their members). In this paper we propose a strategy for handling the class imbalance problem for data sets with multiple classes. For that, we integrate a mixture of experts whose members will be trained as a part of the general problem and, in this way, will improve the behavior of the whole system. For dividing the problem we employ the called Error-correcting output codes (ECOC) methods, when the classes are codified in pairs, which are considered for training the mixture of experts. Experiments with real datasets demonstrate the viability of the proposed strategy.
Keywords
Class imbalance, fusion, mixture of experts, error correcting output codes (ECOC).