Novel Dynamic Decomposition-Based Multi-objective Evolutionary Algorithm Using Reinforcement Learning Adaptive Operator Selection (DMOEA/D-SL)

Authors

  • José Alfredo Brambila-Hernández Tecnológico Nacional de México
  • Miguel Ángel García-Morales Tecnológico Nacional de México
  • Héctor Joaquín Fraire- Huacuja Tecnológico Nacional de México
  • Laura Cruz-Reyes Tecnológico Nacional de México
  • Claudia G. Gómez-Santillán Tecnológico Nacional de México
  • Nelson Rangel-Valdez Tecnológico Nacional de México
  • Héctor José Puga-Soberanes Tecnológico Nacional de México
  • Fausto Balderas Tecnológico Nacional de México

DOI:

https://doi.org/10.13053/cys-28-2-5018

Keywords:

Adaptive, Operator, Selection, Dynamic, multi-objective, Optimization

Abstract

Within the multi-objective (static) optimizationfield, various works related to the adaptive selection ofgenetic operators can be found. These include multiarmedbandit-based methods and probability-basedmethods. For dynamic multi-objective optimization,finding this type of work is very difficult. The maincharacteristic of dynamic multi-objective optimization isthat its problems do not remain static over time; on thecontrary, its objective functions and constraints changeover time. Adaptive operator selection is responsible forselecting the best variation operator at a given timewithin a multi-objective evolutionary algorithm process.This work proposes incorporating a new adaptiveoperator selection method into a Dynamic MultiobjectiveEvolutionary Algorithm Based onDecomposition algorithm, which we call DMOEA/D-SL.This new adaptive operator selection method is basedon a reinforcement learning algorithm called State-Action-Reward-State-Action Lambda or SARSA (λ).SARSA Lambda trains an Agent in an environment tomake sequential decisions and learn to maximize anaccumulated reward over time; in this case, select thebest operator at a given moment. Eight dynamic multiobjectivebenchmark problems have been used toevaluate algorithm performance as test instances. Eachproblem produces five Pareto fronts. Three metrics wereused: Inverted Generational Distance, GeneralizedSpread, and Hypervolume. The non-parametricstatistical test of Wilcoxon was applied with a statisticalsignificance level of 5% to validate the results.

Downloads

Published

2024-06-12

Issue

Section

Articles of the Thematic Section