CPC G06Q 30/0206 (2013.01) [G06Q 30/0211 (2013.01); G06Q 30/0255 (2013.01); G06Q 30/0271 (2013.01)] | 20 Claims |
1. A computer implemented method comprising:
receiving, at an offer generation system, data describing an offer on a product;
applying a natural language processing (NLP) model to the received data to extract the product and the offer;
accessing a plurality of test offers stored by an offer bank;
accessing transaction logs related to the product;
scoring each test offer in the offer bank against the extracted product, wherein scoring a test offer of the plurality of test offers comprises:
predicting a forecast score for each test offer by:
applying a reinforcement learning model to the transaction logs related to the product, wherein the reinforcement learning model is trained using transaction logs associated with the plurality of test offers to predict a likelihood that a test offer will achieve an offer objective;
determining a difference between the offer and the test offer; and
updating the forecast score by applying a penalty to the forecast score based on the determined difference;
selecting a subset of the plurality of test offers based in part on the forecast score of each test offer, wherein assigning the subset of test offers comprises maximizing orthogonality of a set of variables associated with the product;
transmitting the subset of test offers to client devices of users;
receiving, from client devices of the users, responses to the subset of test offers, the responses comprising at least one of (1) whether or not a user viewed a test offer, (2) whether or not a user clicked through a test offer, (3) whether or not a user deleted a test offer, (4) whether or not the user saved the test offer to an electronic coupon folder, (5) whether or not a user forwarded the test offer to someone else, (6) whether or not a user posted a test offer to an online forum, or (7) whether or not the user purchased a product using the test offer;
storing the received responses in a database; and
retraining the reinforcement learning model based on the responses stored in the database, such that the reinforcement learning model automatically learns from user responses and continuously improves effectiveness of selection of test offers.
|