US 11,657,304 B2
Assessing similarity between items using embeddings produced using a distributed training framework
Vladislav Mokeev, Lynnwood, WA (US); and Skyler James Anderson, Kirkland, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on May 1, 2020, as Appl. No. 16/864,964.
Prior Publication US 2021/0342711 A1, Nov. 4, 2021
Int. Cl. G06F 17/00 (2019.01); G06F 7/00 (2006.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06F 16/2457 (2019.01)
CPC G06N 5/04 (2013.01) [G06F 16/24578 (2019.01); G06N 20/00 (2019.01)] 26 Claims
OG exemplary drawing
 
1. A computer-implemented method for producing a set of trained embeddings, comprising:
providing a set of training examples, each training example describing a query submitted by at least one user, an item, and an indication of whether the item has been selected by said at least one user in response to submitting the query;
providing initial token embeddings associated with different training examples to plural respective embedding-updating computing devices;
performing an iteration of a training loop that includes:
using the plural embedding-updating computing devices to generate plural sets of local token embeddings;
providing the plural sets of local token embeddings to plural embedding-consolidating computing devices;
using the plural embedding-consolidating computing devices to generate plural consolidated token embeddings from the plural sets of local token embeddings, a particular consolidated token embedding for a particular token representing a consolidation of two or more different local token embeddings computed by two or more respective embedding-updating computing devices for the same particular token; and
providing the plural consolidated token embeddings to selected embedding-updating computing devices for use by the selected embedding-updating computing devices in performing generation of local token embeddings in a next iteration of the training loop,
repeating the training loop until a training objective is achieved, at which point the plural consolidated token embeddings provided to the embedding-updating computing devices correspond to the set of trained embeddings.