CPC G06F 16/9535 (2019.01) [G06N 20/20 (2019.01)] | 20 Claims |
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing computing instructions that, when executed on the one or more processors, cause the one or more processors to perform operations comprising:
in response to receiving search queries at a search engine, storing search event data and ranking features in one or more databases, wherein the ranking features are stored separately from the search event data;
prior to generating a training dataset, supplementing the search event data with ranking feature values of the ranking features, wherein the ranking feature values were previously utilized to generate previous search results when the search queries were submitted;
generating, using the search event data supplemented with the ranking feature values of the ranking features, the training dataset comprising training event samples;
executing a hybrid labeling procedure that assigns labels to the training event samples based, at least in part, on individual engagement information associated with the training event samples, wherein executing the hybrid labeling procedure comprises:
generating, via a deep learning model, respective relevance scores for the search queries that are tail queries;
applying a first set of labels of the labels to a first portion of the training event samples determined to have positive engagement, the first set of labels being assigned to the first portion of the training event samples based on engagement activity types;
applying a second set of labels of the labels to a second portion of the training event samples that have negative engagement, the second set of labels being assigned to the second portion of the training event samples based on aggregated engagement information for items across global users, wherein the aggregated engagement information is based on a frequency of engagement by the global users on a global scale, and wherein the second set of labels are not eliminated and are assigned lower values than the first set of labels in search results; and
adjusting a subset of the first set of labels and the second set of labels that are associated with the tail queries based on the respective relevance scores associated with the tail queries; and
training a personalized ranking model to rank the search results using the training event samples and the labels.
|