| CPC B65G 43/08 (2013.01) [B65G 43/10 (2013.01); G05B 19/4189 (2013.01)] | 19 Claims |

|
1. A computer-implemented method for controlling a control device of a conveyor system for transporting piece goods of at least one type including mail items and pieces of luggage, wherein the conveyor system has a plurality of conveyor elements aligned along and parallel to a conveying direction, the conveyor elements being driven, under control of the control device, by a respectively assigned drive at an individually adjustable velocity to achieve an alignment and/or a defined spacing of the piece goods, wherein the activation of the control device is determined by at least one agent acting or predetermined according to methods of Reinforcement Learning, which agent, in accordance with a strategy, situationally selects an action from an action space for an initial state in order to reach a subsequent state, wherein the initial state and the subsequent state are mappable with state vectors and the actions are mappable with action vectors, the method comprising the process steps:
a) creating an output image of the conveyor system;
b) for each of the piece goods on the image, individually creating a state vector of predetermined dimension and of the same dimension for all piece goods of one type, comprising state information of the respective item taken from the immediately previously created image;
c) for each piece good individually selecting an action vector from an action space according to the strategy, which is the same for all piece goods of a kind, for the current state vector of each piece good, the dimension of the action vector being predetermined;
d) for each piece good mapping the action vector onto the conveying elements of each piece good to determine the velocity of these conveying elements, and corresponding control of the conveying elements with the control device;
e) creating of a sequential image of the conveyor system and performing process step b) to obtain a state vector of the subsequent state for each piece good after a cycle time has elapsed;
f) if the strategy for piece goods of one type is to be trained further during the execution of the method, the state vector of the subsequent state is evaluated for each piece good of this kind by a method of Reinforcement Learning on the basis of a reward, whereupon the agent trains and thus optimizes the strategy for piece goods of this kind by adjusting the action vectors of the action space; and
g) for each piece good, carrying out the process steps c)-f) again using the improved or predetermined strategy as long as the piece good concerned is shown on the subsequent image.
|