| CPC G06F 30/13 (2020.01) [G06F 30/27 (2020.01)] | 3 Claims |

|
1. An automatic stone arrangement method based on image and deep reinforcement learning, comprising:
(1) building a stone arrangement simulation environment, and designing an action space and a return function based on rules that parts are provided inside a raw material, the parts do not overlap with each other, and a utilization of the raw material is maximized;
(2) building a Proximal Policy Optimization-Contrastive Language-Image Pre-training (PPO-Clip) algorithm model with an Actor-Critic network, and based on the stone arrangement simulation environment, the action space and the return function, training simultaneously an Actor neural network and a Critic neural network by using an Actor-Critic algorithm; inputting a current state s into the Actor neural network, wherein the current state s comprises raw material information and to-be-arranged part information, output a comprises a serial number of a part, a displacement of the part, and a rotation angle of the part; representing a probability of the output a in the current state s through a policy function πθ(a|s); and outputting, by the Critic neural network, a prediction of the return function according to an action of the Actor neural network in the current state s;
wherein the Actor neural network and the Critic neural network have a same convolutional network structure, an input of a first convolutional layer is images obtained through preprocessing a part image and a raw material image in the current state s, a number of output channels, a convolution kernel size, and a step size of the first convolutional layer are set; a first max-pooling is performed; a number of input channels, the number of output channels, a convolution kernel size, and a step size of a multilayer convolutional layer are set, respectively; a second max-pooling is performed; a multi-dimensional feature image outputted by the multilayer convolutional layer and a pooling layer is flattened into a one-dimensional vector, and the one-dimensional vector is performed with multi-scale feature fusion by a spatial pyramid pooling (SPP) layer and then outputted through a fully connected layer; the Actor neural network outputs the action taken in the current state s, and the Critic neural network outputs a return size obtained after taking the action in the current state s; and
an Actor-Critic network loss function defined by the PPO-Clip algorithm model is expressed as:
![]() where
![]() is a ratio of a new strategy and an old strategy,
![]() is a dominance function which is a difference between an action value function and a strategy value function, Clip is a clipping function, ε is a hyperparameter,
![]() θk is a k-th network parameter, θ is an updated network parameter, πθ(a|s) is the new strategy, πθk(a|s) is the old strategy, Q is the action value function, V is a state value function, and x is a function independent variable; and
(3) based on a trained PPO-Clip algorithm model and a new production order of a stone part, providing images, obtained through preprocessing a corresponding part image and a corresponding raw material image to the new production order, as an input to the trained PPO-Clip algorithm model, and outputting an action instruction for arranging the part to complete an automatic arrangement task under the new production order;
the step (1) is performed through steps of:
(1-1) setting a length and a width of a raw material image according to requirements of size and pixel accuracy of a stone raw material;
(1-2) according to rules that the parts are provided inside the raw material, the parts do not overlap with each other, and the utilization of the raw material is maximized, dividing a design action into three sub-actions: part selection, part translation, and part rotation, and discretizing the rotation angle into 2kπ/N, where k=0, 1, . . . , N−1, and Nis a positive integer, and the return function is expressed as:
![]() wherein SArranged part is a sum of areas of parts that have been arranged onto the raw material, LArranged part is a total length of the parts that have been arranged onto the raw material, h is a width of the raw material, and Fa is a positive number;
wherein in the step (1), a pixel value of each point in the raw material image is set to 1;
in the step (2), the first convolutional layer has the number of output channels of 32, the convolution kernel size of 8×8, and the step size of 2;
a second convolutional layer has the number of input channels of 32, the number of output channels of 64, a convolution kernel size of 4×4, and a step size of 2; and
a third convolutional layer has the number of input channels of 64, the number of output channels of 64, a convolutional kernel size of 3×3, and a step size of 1.
|