US 11,715,008 B2
Neural network training utilizing loss functions reflecting neighbor token dependencies
Eugene Indenbom, Moscow (RU); and Daniil Anastasiev, Moscow (RU)
Assigned to ABBYY Development Inc., Dover, DE (US)
Filed by ABBYY Development Inc., Dover, DE (US)
Filed on Dec. 29, 2018, as Appl. No. 16/236,382.
Claims priority of application No. RU2018146352 (RU), filed on Dec. 25, 2018.
Prior Publication US 2020/0202211 A1, Jun. 25, 2020
Int. Cl. G06F 40/205 (2020.01); G06N 3/084 (2023.01); G06F 40/284 (2020.01); G10L 17/18 (2013.01)
CPC G06N 3/084 (2013.01) [G06F 40/284 (2020.01); G10L 17/18 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving a training dataset comprising a sequence of labeled tokens comprising a first token, a second token, and a third token, wherein the second token follows the first token, and the third token follows the second token;
producing, by a neural network, a set of vectors, wherein each vector of set of vectors encodes information about a corresponding token of the sequence of labeled tokens and further encodes information about a context of the corresponding token;
determining, by the neural network processing the set of vectors, a first tag corresponding to the first token, a second tag corresponding to the second token, and a third tag corresponding to the third token;
computing, for the training dataset, a value of a loss function reflecting a first loss value, a second loss value, and a third loss value, wherein the first loss value is represented by a first difference of the first tag and a first label associated with the first token by the training dataset, wherein the second loss value is represented by a second difference of the second tag and a second label associated with the second token by the training dataset, and wherein the third loss value is represented by a third difference of the third tag and a third label associated with the third token by the training dataset; and
adjusting a parameter of the neural network based on the value of the loss function.