US 12,205,005 B2
	Sampled softmax with Random Fourier features
Xinnan Yu, Forest Hill, NY (US); Ankit Singh Rawat, New York, NY (US); Jiecao Chen, San Jose, CA (US); Ananda Theertha Suresh, New York, NY (US); and Sanjiv Kumar, Jericho, NY (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Jul. 17, 2020, as Appl. No. 16/931,862.
Claims priority of provisional application 62/876,408, filed on Jul. 19, 2019.
Prior Publication US 2021/0019654 A1, Jan. 21, 2021
Int. Cl. G06N 3/08 (2023.01); G06F 17/14 (2006.01); G06F 17/18 (2006.01); G06F 18/2431 (2023.01); G06F 40/20 (2020.01); G06N 3/084 (2023.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01); G06V 10/77 (2022.01)

CPC G06N 20/10 (2019.01) [G06F 17/142 (2013.01); G06F 17/18 (2013.01); G06F 18/2431 (2023.01); G06F 40/20 (2020.01); G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06N 20/00 (2019.01); G06V 10/7715 (2022.01)]

20 Claims

1. A computer-implemented method of training a machine-learned model for classifying inputs into one or more classes of a plurality of classes, each of the plurality of classes having an associated class embedding in a plurality of class embeddings, the method comprising:

receiving, by one or more computing devices, one or more inputs and one or more labels;

determining, by the one or more computing devices, one or more input embeddings associated with the one or more inputs;

normalizing, by the one or more computing devices using a vector norm, the one or more input embeddings to obtain one or more normalized input embeddings;

normalizing, by the one or more computing devices using the vector norm, the plurality of class embeddings to obtain a plurality of normalized class embeddings;

selecting, by the one or more computing devices, one or more negative classes from the plurality of classes based at least in part on a probability distribution approximating a softmax distribution, wherein the probability distribution comprises a linearized kernel determined based at least in part on a Random Fourier Features map, wherein the linearized kernel provides a uniform multiplicative approximation of an exponential kernel associated with the softmax distribution, and wherein the probability distribution is a function of the one or more normalized input embeddings and the plurality of normalized class embeddings;

evaluating, by the one or more computing devices, a loss function to determine a loss based at least in part on the one or more negative classes, the one or more inputs, and the one or more labels; and

adjusting, by the one or more computing devices, one or more parameters of the machine-learned model based at least in part on the loss, the one or more inputs, and the one or more labels.