US 12,236,325 B2
Contextual bandit for multiple machine learning models for content delivery
Shankar Sankararaman, Burlingame, CA (US)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Jul. 6, 2023, as Appl. No. 18/348,052.
Prior Publication US 2025/0013914 A1, Jan. 9, 2025
Int. Cl. G06Q 30/02 (2023.01); G06N 3/006 (2023.01); G06N 3/082 (2023.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06N 20/00 (2019.01) 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for delivery of content, comprising:
training, by at least one processor, a contextual bandit machine learning (ML) model based on user information to select at least one ML model from a plurality of ML models, wherein each ML model in the plurality of ML models is configured to select one or more user interface (UI) elements from a plurality of UI elements to be presented to users based on the user information, and the contextual bandit ML model is configured to select the at least one ML model that is most likely to select UI elements with which a user will interact;
receiving, by the at least one processor, user information for a request payload from an external device associated with a user;
receiving, by the at least one processor, data describing a plurality of UI elements configured to be presented in a UI of the external device;
selecting, by the at least one processor, with the contextual bandit ML model based on the user information, an ML model from the plurality of ML models to determine at least one UI element from the plurality of UI elements to recommend to the user;
determining, by the at least one processor, with a selected ML model based on the user information and the data describing the plurality of UI elements, at least one recommended UI element from the plurality of UI elements to present to the user;
providing, by the at least one processor, the at least one UI element for presentation in a UI of the external device associated with the user;
receiving, by the at least one processor, event data indicating whether the user interacted with the at least one recommended UI element in the UI of the external device; and
re-training, by the at least one processor, the contextual bandit ML model based on the event data to increase a probability of selecting the selected ML model based on the user information in response to an indication that the user interacted with the at least one UI element and to increase a probability of selecting a different ML model based on the user information in response to an indication that the user did not interact with the at least one UI element.