US 12,481,801 B2
	Method, device, electronic device, and computer-readables storage medium for automatic stone arrangement
Jianping Qiu, Foshan (CN); Zhongyuan Li, Foshan (CN); Yumin Chen, Foshan (CN); and Yuchi Qian, Foshan (CN)
Assigned to VEEGOO TECHNOLOGY CO., LTD., Guangdong (CN)
Filed by VEEGOO TECHNOLOGY CO., LTD., Foshan (CN)
Filed on Jan. 15, 2025, as Appl. No. 19/021,615.
Application 19/021,615 is a continuation of application No. PCT/CN2024/139919, filed on Dec. 17, 2024.
Claims priority of application No. 202410223626.2 (CN), filed on Feb. 29, 2024.
Prior Publication US 2025/0232074 A1, Jul. 17, 2025
Int. Cl. G06F 30/13 (2020.01); G06F 30/27 (2020.01)

CPC G06F 30/13 (2020.01) [G06F 30/27 (2020.01)]

3 Claims

1. An automatic stone arrangement method based on image and deep reinforcement learning, comprising:

(1) building a stone arrangement simulation environment, and designing an action space and a return function based on rules that parts are provided inside a raw material, the parts do not overlap with each other, and a utilization of the raw material is maximized;

(2) building a Proximal Policy Optimization-Contrastive Language-Image Pre-training (PPO-Clip) algorithm model with an Actor-Critic network, and based on the stone arrangement simulation environment, the action space and the return function, training simultaneously an Actor neural network and a Critic neural network by using an Actor-Critic algorithm; inputting a current state s into the Actor neural network, wherein the current state s comprises raw material information and to-be-arranged part information, output a comprises a serial number of a part, a displacement of the part, and a rotation angle of the part; representing a probability of the output a in the current state s through a policy function π_θ(a|s); and outputting, by the Critic neural network, a prediction of the return function according to an action of the Actor neural network in the current state s;

wherein the Actor neural network and the Critic neural network have a same convolutional network structure, an input of a first convolutional layer is images obtained through preprocessing a part image and a raw material image in the current state s, a number of output channels, a convolution kernel size, and a step size of the first convolutional layer are set; a first max-pooling is performed; a number of input channels, the number of output channels, a convolution kernel size, and a step size of a multilayer convolutional layer are set, respectively; a second max-pooling is performed; a multi-dimensional feature image outputted by the multilayer convolutional layer and a pooling layer is flattened into a one-dimensional vector, and the one-dimensional vector is performed with multi-scale feature fusion by a spatial pyramid pooling (SPP) layer and then outputted through a fully connected layer; the Actor neural network outputs the action taken in the current state s, and the Critic neural network outputs a return size obtained after taking the action in the current state s; and

an Actor-Critic network loss function defined by the PPO-Clip algorithm model is expressed as:

where

is a ratio of a new strategy and an old strategy,

is a dominance function which is a difference between an action value function and a strategy value function, Clip is a clipping function, ε is a hyperparameter,

θ_kis a k-th network parameter, θ is an updated network parameter, π_θ(a|s) is the new strategy, πθ_k(a|s) is the old strategy, Q is the action value function, V is a state value function, and x is a function independent variable; and

(3) based on a trained PPO-Clip algorithm model and a new production order of a stone part, providing images, obtained through preprocessing a corresponding part image and a corresponding raw material image to the new production order, as an input to the trained PPO-Clip algorithm model, and outputting an action instruction for arranging the part to complete an automatic arrangement task under the new production order;

the step (1) is performed through steps of:

(1-1) setting a length and a width of a raw material image according to requirements of size and pixel accuracy of a stone raw material;

(1-2) according to rules that the parts are provided inside the raw material, the parts do not overlap with each other, and the utilization of the raw material is maximized, dividing a design action into three sub-actions: part selection, part translation, and part rotation, and discretizing the rotation angle into 2kπ/N, where k=0, 1, . . . , N−1, and Nis a positive integer, and the return function is expressed as:

wherein S_{Arranged part}is a sum of areas of parts that have been arranged onto the raw material, L_{Arranged part}is a total length of the parts that have been arranged onto the raw material, h is a width of the raw material, and Fa is a positive number;

wherein in the step (1), a pixel value of each point in the raw material image is set to 1;

in the step (2), the first convolutional layer has the number of output channels of 32, the convolution kernel size of 8×8, and the step size of 2;

a second convolutional layer has the number of input channels of 32, the number of output channels of 64, a convolution kernel size of 4×4, and a step size of 2; and

a third convolutional layer has the number of input channels of 64, the number of output channels of 64, a convolutional kernel size of 3×3, and a step size of 1.