US 12,236,635 B1
	Digital person training method and system, and digital person driving system
Huapeng Sima, Nanjing (CN); Hao Jiang, Nanjing (CN); Hongwei Fan, Nanjing (CN); Qixun Qu, Nanjing (CN); Jiabin Li, Nanjing (CN); and Jintai Luan, Nanjing (CN)
Filed by NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD., Nanjing (CN)
Filed on Aug. 19, 2024, as Appl. No. 18/809,315.
Int. Cl. G06K 9/00 (2022.01); G06T 5/50 (2006.01); G06T 7/73 (2017.01)

CPC G06T 7/73 (2017.01) [G06T 5/50 (2013.01); G06T 2207/20132 (2013.01); G06T 2207/20221 (2013.01); G06T 2207/30196 (2013.01)]

8 Claims

1. A digital person training method, comprising:

obtaining training data, and extracting human-body pose estimation data from the training data, wherein the training data is image data with a pose label, the image data comprises a single sample person image, and pose actions of sample persons in different image data are different;

inputting position estimation data, speed estimation data, and acceleration estimation data in the human-body pose estimation data into an optimized pose estimation network to obtain human-body pose optimization data, wherein sample balancing processing is performed on the human-body pose estimation data when the human-body pose estimation data is input into the optimized pose estimation network; the optimized pose estimation network comprises a first branch layer, a second branch layer, and a third branch layer; the first branch layer, the second branch layer, and the third branch layer are branch layers parallel to each other; the first branch layer is configured to calculate position optimization data in the human-body pose optimization data; the second branch layer is configured to calculate speed optimization data in the human-body pose optimization data; the third branch layer is configured to calculate acceleration optimization data in the human-body pose optimization data; data output from the first branch layer, the second branch layer, and the third branch layer is respectively output by the fully connected layer and then enters a linear fusion layer for feature fusion, so as to obtain the human-body pose optimization data; and inputting the human-body pose estimation data into the optimized pose estimation network is expressed by the following formula:

Ĝ=g(Ŷ),

wherein it is satisfied that Ĝ∈R^L×C, wherein Ĝ represents an optimized and improved human-body pose estimation result, R represents a three-dimensional rotational Euler angle of a human-body key point, and L represents a quantity of input image frames; and it is satisfied that C=N×D, wherein N represents a quantity of defined human-body key points, D represents a quantity of output dimensions, in a three-dimensional human-body pose estimation issue, D=3, and Ŷ represents an estimated value for a key-point Euler angle that is output by calculating a human-body pose estimation algorithm based on a basic smoothing network, wherein output of the branch layer of the optimized pose estimation network is expressed by the following formula:

wherein l represents a first layer of the network; σ represents a non-linear activation function; ω_r^land b^lrepresent weight and bias learned at a t^thframe, respectively; and T represents a sliding window,

wherein the step of performing balancing processing on the human-body pose estimation data comprises:

defining p_train(x,y) as training distribution, wherein p_train(y) represents a non-equilibrium training distribution, p_bal(x, y) represents a balance test distribution, p_bal(y) represents a uniform distribution, and relationships between p_train(y|x), p_bal(y|X), and p_train(y) can be expressed by the following distribution data relationship formula:

predicting an estimated value for the three-dimensional rotational Euler angle of the human-body key point according to an expected expression formula, wherein the expected expression formula is:

p_bal(y|x;θ)=N(y;y_pred,σ_noise²I),

wherein θ represents a parameter of the optimized pose estimation network for training, y_predrepresents the estimated value for the three-dimensional rotational Euler angle of the human-body key point, and σ_noise²I represents a variance matrix of a Gaussian distribution to which the estimated value for the three-dimensional rotational Euler angle conforms;

calculating an equilibrium mean square error loss based on the distribution data relationship formula and an equilibrium mean square error loss formula, wherein the equilibrium mean square error loss is used for balancing the human-body pose estimation data, the equilibrium mean square error loss is determined by a maximum likelihood loss of a corresponding conditional probability after p_bal(y|x; θ) is converted into p_train(y|x; θ), and the equilibrium mean square error loss formula is:

calculating generation losses of the position optimization data and the acceleration optimization data in the human-body pose optimization data based on a loss function of the optimized pose estimation network, to minimize errors between the position estimation data and the acceleration estimation data and a real value; and

driving, based on the loss function, the optimized pose estimation network to update a network parameter to obtain an optimal driving model that is based on the optimized pose estimation network.