US 12,488,233 B2
Neural network training method and apparatus, computer device, and storage medium
Zhao Peng Tu, Shenzhen (CN); Jian Li, Shenzhen (CN); Bao Song Yang, Shenzhen (CN); and Tong Zhang, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Guangdong (CN)
Filed on Oct. 15, 2020, as Appl. No. 17/071,078.
Application 17/071,078 is a continuation of application No. PCT/CN2019/103338, filed on Aug. 29, 2019.
Claims priority of application No. 201811032787.4 (CN), filed on Sep. 5, 2018.
Prior Publication US 2021/0027165 A1, Jan. 28, 2021
Int. Cl. G06N 3/08 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A neural network training method, performed by a computer device, the method comprising:
obtaining a training sample set, each training sample in the training sample set including a corresponding standard label;
training a neural network module based on inputting the each training sample in the training sample set into the neural network model, the neural network model comprising n attention networks, the n attention networks respectively mapping the each training sample to n different subspaces, each subspace of the n subspaces comprising a corresponding query vector sequence, a corresponding key vector sequence, and a corresponding value vector sequence, and n being an integer greater than 1;
determining a space difference degree between the n subspaces by using the neural network model;
determining an output similarity degree according to an output of the neural network model and the standard label corresponding to the each training sample; and
retraining the neural network model by adjusting a model parameter of the neural network model according to the space difference degree and the output similarity degree until a convergence condition is satisfied, thereby obtaining a target neural network model based on retraining the neural network model by adjusting the model parameter.