US 12,265,909 B2
	Systems and methods for a k-nearest neighbor based mechanism of natural language processing models
Nazneen Rajani, Mountain View, CA (US); Tong Niu, Sunnyvale, CA (US); and Wenpeng Yin, Palo Alto, CA (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on Nov. 5, 2020, as Appl. No. 17/090,553.
Claims priority of provisional application 63/033,197, filed on Jun. 1, 2020.
Prior Publication US 2021/0374488 A1, Dec. 2, 2021
Int. Cl. G06N 3/08 (2023.01); G06F 18/10 (2023.01); G06F 18/214 (2023.01); G06F 18/2413 (2023.01); G06F 18/2415 (2023.01); G06N 3/063 (2023.01)

CPC G06N 3/08 (2013.01) [G06F 18/10 (2023.01); G06F 18/214 (2023.01); G06F 18/24147 (2023.01); G06F 18/2415 (2023.01); G06N 3/063 (2013.01)]

14 Claims

1. A method of identifying inaccurate labels in training data, the method comprising:

receiving, at a data interface, a training set of sequences, wherein each sequence from the training set is paired with a target label from a plurality of labels;

mapping, by a neural network, each training sequence to a respective normalized hidden representation vector, including:

computing the respective normalized hidden representation vector based on a dataset-wise batch normalization of hidden representation vectors with a mean and a standard deviation over hidden states of the neural network responsive to the training set of sequences,

wherein the mean and the standard deviation of hidden states are obtained over the training set of sequences, and

wherein the computing further comprises dividing a difference between a first hidden representation vector and the mean by a sum of the standard deviation and a numerical stability parameter;

receiving a testing sequence at inference stage;

after receiving the testing sequence at inference stage, mapping, by the neural network, the test sequence to a normalized test hidden representation vector;

determining, among the training set of sequences, a set of sequence indices that lead to a set of smallest distances between respective normalized respective hidden state vectors and the normalized test hidden state vector;

computing, for each training sequence in the set of sequence indices, a weighted probability score based on a set of distances corresponding to the set of sequence indices;

generate a probability distribution over the plurality of labels for the test sequence based on computed weighted probability scores and one-hot encodings of each target label in the plurality of labels; and

identifying a mislabeled training sequence from the training set of sequences when a direct prediction by the neural network is different from a prediction based on the generated probability distribution.