Hybrid algorithm based on Reinforcement Learning and DDMRP methodology for inventory management


This article proposes a hybrid algorithm based on Reinforcement Learning and on the inventory management methodology called DDMRP (Demand Driven Material Requirement Planning) to determine the optimal time to buy a certain product, and how much quantity should be requested. For this, the inventory management problem is formulated as a Markov Decision Process where the environment with which the system interacts is designed from the concepts raised in the DDMRP methodology, and through the Reinforcement Learning algorithm – specifically, Q-Learning. The optimal policy is determined for making decisions about when and how much to buy. To determine the optimal policy, three approaches are proposed for the reward function: the first one is based on inventory levels; the second is an optimization function based on the distance of the inventory to its optimal level, and the third is a shaping function based on levels and distances to the optimal inventory. The results show that the proposed algorithm has promising results in scenarios with different characteristics, performing adequately in difficult case studies with a diversity of situations such as scenarios with discontinuous or continuous demand, seasonal and non-seasonal behavior with high demand peaks, multiple lead times, among others.


Palabras clave

Sistemas de gestión de inventario, DDMRP