US 11,657,223 B2
Keyphase extraction beyond language modeling
Li Xiong, Kirkland, WA (US); Chuan Hu, Redmond, WA (US); Arnold Overwijk, Redmond, WA (US); Junaid Ahmed, Bellevue, WA (US); Daniel Fernando Campos, Seattle, WA (US); and Chenyan Xiong, Bellevue, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Dec. 16, 2021, as Appl. No. 17/552,742.
Application 17/552,742 is a continuation of application No. 16/460,853, filed on Jul. 2, 2019, granted, now 11,250,214.
Prior Publication US 2022/0108078 A1, Apr. 7, 2022
Int. Cl. G06F 40/30 (2020.01); G06F 40/284 (2020.01); G06F 40/211 (2020.01); G06K 9/62 (2022.01); G10L 15/08 (2006.01); G06N 3/08 (2023.01)
CPC G06F 40/30 (2020.01) [G06F 40/211 (2020.01); G06F 40/284 (2020.01); G06K 9/6256 (2013.01); G06K 9/6263 (2013.01); G06N 3/08 (2013.01); G10L 2015/088 (2013.01)] 20 Claims
OG exemplary drawing
1. A system comprising:
a processor; and
memory storing instructions that, when executed by the processor, cause the processor to perform acts comprising:
obtaining a webpage, wherein a topical authority score with respect to a topic is to be computed for the webpage, and further wherein the topical authority score is representative of authoritativeness of the webpage with respect to the topic;
computing hybrid embeddings for words in the webpage, where the hybrid embeddings are based upon semantic embeddings for the words in the webpage and visual embeddings for the words in the webpage, wherein the visual embeddings are based upon visual features of the words;
computing a key phrase score for a sequence of words in the words, wherein the key phrase score is computed based upon the hybrid embeddings computed for the words; and
assigning the topical authority score to the webpage based upon the key phrase score computed for the sequence of words, wherein the webpage is ranked in a ranked list of search results returned to a user based upon:
a query received from the user; and
the topical authority score assigned to the webpage.