US 12,229,669 B2
Techniques for improving standardized data accuracy
Shuai Wang, Milburn, NJ (US); Peide Zhong, Milpitas, CA (US); Ji Yan, Dublin, CA (US); Feng Guo, Los Gatos, CA (US); Dan Shacham, Sunnyvale, CA (US); and Fei Chen, Saratoga, CA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 7, 2021, as Appl. No. 17/340,607.
Prior Publication US 2022/0391690 A1, Dec. 8, 2022
Int. Cl. G06N 3/08 (2023.01); G06F 16/334 (2025.01); G06F 18/2113 (2023.01); G06F 18/214 (2023.01); G06F 18/2413 (2023.01); G06N 3/04 (2023.01); G06Q 10/1053 (2023.01)
CPC G06N 3/08 (2013.01) [G06F 16/3347 (2019.01); G06F 18/2113 (2023.01); G06F 18/214 (2023.01); G06F 18/24147 (2023.01); G06N 3/04 (2013.01); G06Q 10/1053 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
training a multilayer perceptron neural network using training data, wherein an instance of training data in the set of training data includes a first vector representation of a job title in a multilingual word embedding space and a second vector representation of an entity in an entity embedding space, wherein the job title corresponding with the first vector representation is a job title that corresponds with a job title associated with the entity represented by the second vector representation in the entity embedding space;
providing as input to an input layer of the multilayer perceptron neural network a vector representation of a job title of an online job posting, the vector representation of the job title of the online job posting derived by mapping one or more words identified in raw text of the job title to one or more pre-trained multilingual word embeddings in the multilingual word embedding space, wherein a pre-trained multilingual word embedding comprises a vector representation of the job title expressed in multiple languages;
with the multilayer perceptron neural network, processing the input to translate the vector representation of the job title of the online job posting in the multilingual word embedding space to a vector representation of the job title of the online job posting in the entity embedding space associated with a multilingual title taxonomy;
performing a nearest neighbor search to identify one or more vector representations corresponding with one or more entity embeddings in the entity embedding space, each of the one or more entity embeddings associated with a job title from the multilingual title taxonomy; and
storing with the online job posting at least one of the one or more vector representations corresponding with the entity embedding in the entity embedding space.