US 11,756,076 B2
Systems and methods for providing sponsored recommendations
Kaushiki Nag, Sunnyvale, CA (US); Kannan Achan, Saratoga, CA (US); and Lalitesh Morishetti, Sunnyvale, CA (US)
Assigned to Walmart Apollo, LLC, Bentonville, AR (US)
Filed by Walmart Apollo, LLC, Bentonville, AR (US)
Filed on Feb. 26, 2021, as Appl. No. 17/187,507.
Prior Publication US 2022/0277345 A1, Sep. 1, 2022
Int. Cl. G06Q 30/0251 (2023.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06Q 30/0254 (2013.01) [G06N 5/04 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a computing device configured to:
obtain user data associated with a user from a database;
obtain a plurality of items based on the user data, the plurality of items including a plurality of relevant items and a plurality of promotional items, wherein each of the relevant items includes an associated relevancy score;
determine a page limit and a number of positions within a page including a number of carousels or panes, for a user interface;
obtain session data for a current session associated with the user;
generate a set of tensors representative of each of the plurality of items, the user data, the session data, and the associated relevancy score for each of the relevant items;
a implement a Deep Q-network configured to receive a first subset of the set of tensors, wherein the Deep Q-network is configured to generate one or more combinations of the plurality of items by injecting the plurality of promotional items at different positions amongst the plurality of relevant items, according to the page limit and the number of positions, wherein the Deep Q-network is configured to maximize a profit function that assigns an expected profit margin for each item at a position in the user interface and a contribution profit for each item, wherein the Deep Q-network is configured to maintain relevancy of each of the one or more combinations of the plurality of items, and wherein each of the one or more combinations of the plurality of items comprises a subset of the plurality of items;
select a selected set of items from the one of the one or more combinations of the plurality of items, wherein the selected set of items is selected by a position model configured to receive a second subset of the set of tensors and an output of the Deep Q-network;
present, via the user interface displayed on a display, the selected set of items to the user; and
refine a machine learning model based on the selected set of items wherein the machine learning model is refined using an experience relay to reduce correlations and a mean square error loss function, wherein the machine learning module is configured to generate the plurality of relevant items and the plurality of promotional items.