US 11,868,413 B2
Methods and servers for ranking digital documents in response to a query
Eduard Mechislavovich Volynets, Minsk (BY); Dzianis Sergeevich Pastushyk, Brest (BY); and Yevgeny Aleksandrovich Grechnikov, Moscow (RU)
Assigned to DIRECT CURSUS TECHNOLOGY L.L.C, Dubai (AE)
Filed by YANDEX EUROPE AG, Lucerne (CH)
Filed on Dec. 16, 2021, as Appl. No. 17/553,099.
Claims priority of application No. 2020142462 (RU), filed on Dec. 22, 2020.
Prior Publication US 2022/0197958 A1, Jun. 23, 2022
Int. Cl. G06F 16/953 (2019.01); G06N 3/08 (2023.01); G06F 16/93 (2019.01); G06N 5/01 (2023.01)
CPC G06F 16/953 (2019.01) [G06F 16/93 (2019.01); G06N 3/08 (2013.01); G06N 5/01 (2023.01)] 16 Claims
OG exemplary drawing
 
1. A method of ranking digital documents in response to a query, the digital documents being potentially relevant to the query having a first term and a second term, the query having been submitted by a user of an electronic device communicatively coupled with a server hosting a search engine, the search engine being associated with an inverted index storing information associated with document-term (DT) pairs, the method executable by the server, the method comprising:
for a given document from a plurality of potentially relevant documents:
accessing the inverted index for retrieving query-independent data for a first DT pair and a second DT pair, the first DT pair having the given document and the first term, the second DT pair having the given document and the second term,
the query-independent data being indicative of (i) a term-specific occurrence of the first term in content associated with the given document and (ii) a term-specific occurrence of the second term in the content associated with the given document;
generating a query-dependent feature using the query-independent data retrieved for the first DT pair and the second DT pair,
the query-dependent feature being indicative of a group occurrence of the first term with the second term in the content associated with the given document;
generating a ranking feature for the given document based on at least the first term, the second term, and the query-dependent feature, the generating the ranking feature for the given document being performed by a Neural Network (NN); and
ranking, by the server, the given document from the plurality of potentially relevant documents based on at least the ranking feature;
wherein the method further comprises training the NN to generate the ranking feature, the training the NN comprising:
generating a training set for a training document-query (DQ) pair to be used during a given training iteration of the NN, the training DQ pair having a training query and a training document, the training document being associated with a label, the label being indicative of relevance of the training document to the training query, the generating comprising:
generating a plurality of training term embeddings based on respective terms from the training query;
accessing the inverted index associated with the search engine for retrieving a plurality of query-independent datasets associated with respective ones of a plurality of training DT pairs,
a given one of the plurality of training DT pairs including the training document and a respective one of the plurality of terms from the training query;
generating a plurality of training feature vectors for the plurality of training DT pairs using the plurality of query-independent datasets;
during a given training iteration of the NN:
inputting, into the NN, the plurality of training term embeddings and the plurality of training feature vectors for generating a predicted ranking feature for the training DQ pair; and
adjusting the NN based on a comparison between the label and the predicted ranking feature so that the NN generates for a given in-use DQ pair a respective predicted ranking feature that is indicative of relevance of a respective in-use document to a respective in-use query.