US 11,989,649 B2
	Pairwise ranking using neural networks
Xiaohong Gong, Sunnyvale, CA (US); Arturo Bajuelos Castillo, Miami, FL (US); Sanjeev Jagannatha Rao, Sunnyvale, CA (US); Xueliang Lu, Fremont, CA (US); Amogh S. Asgekar, Palo Alto, CA (US); Anton Alexandrov, San Francisco, CA (US); and Carsten Miklos Steinebach, El Cerrito, CA (US)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Nov. 18, 2020, as Appl. No. 16/951,362.
Claims priority of provisional application 63/045,002, filed on Jun. 26, 2020.
Prior Publication US 2021/0406680 A1, Dec. 30, 2021
Int. Cl. G06N 3/08 (2023.01); G06N 3/045 (2023.01); G06Q 30/0601 (2023.01); G06F 3/04842 (2022.01)

CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01); G06Q 30/0641 (2013.01); G06F 3/04842 (2013.01)]

20 Claims

1. A method of training a neural network, wherein the neural network has network parameter values and is used to generate a ranking score for a network input, and

wherein the method comprises:

generating training data for training the neural network, wherein the training data includes a plurality of training pairs, each training pair comprising a respective training network input with a positive label and a respective training network input with a negative label, the generating comprising:

obtaining data indicating that a plurality of training network inputs were displayed in a user interface according to a presentation order,

obtaining data indicating that a first training network input of the plurality of training network inputs has a positive label, wherein the positive label is based on a user interacting with the first training network input when the training network inputs were displayed in a user interface according to the presentation order,

determining that a second training network input of the plurality of training network inputs (i) has a negative label and (ii) was positioned before the first training network input in the presentation order when the plurality of training network inputs were displayed in the user interface, wherein the negative label is based on the user not interacting with the second training network input when the training network inputs were displayed in a user interface according to the presentation order, and

based on determining that the second training network input (i) has a negative label and (ii) was positioned before the first training network input in the presentation order, generating a training pair that includes the first training network input and the second training network input; and

training the neural network on the training data, the training comprising:

processing the first training network input through the neural network to generate a first ranking score;

processing the second training network input through the neural network to generate a second ranking score;

generating a pairwise loss for the training pair that includes the first and second training network inputs based on a difference between the first ranking score generated by the neural network for the first training network input and the second ranking score generated by the neural network for the second training network input; and

updating the network parameter values for the neural network using the pairwise loss.