US 11,854,028 B2
Reinforcement learning applied to survey parameter optimization
Matthew Spencer Donald Kerr, Issaquah, WA (US); Shaker Asif Khaleque, Bothell, WA (US); Kelly John Forsmann, Boise, ID (US); Tatiana Shubin, Redmond, WA (US); and Dipti A. Patil, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Nov. 1, 2021, as Appl. No. 17/516,681.
Prior Publication US 2023/0137708 A1, May 4, 2023
Int. Cl. G06Q 30/02 (2023.01); G06Q 30/0203 (2023.01); G06Q 30/0204 (2023.01); G06F 18/214 (2023.01); G06F 18/21 (2023.01)
CPC G06Q 30/0203 (2013.01) [G06F 18/214 (2023.01); G06F 18/2178 (2023.01); G06Q 30/0204 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A method comprising:
training, by a network system using input data obtained from feedback indicating whether users accepted, rejected, or ignored a past notification, a machine learning model that provides one or more parameters used by the network system in providing a notification, the training comprising:
extracting features from the feedback, the extracted features including an indication of whether the notification was accepted, rejected, or ignored, the extracted features further including one or more context features, the context features including an application identifier, a language, a country, a time of day, a day of a week, or a day of a month; and
using the extracted features to train the machine learning model, wherein acceptance is a positive reward, rejection is a negative reward, and ignoring is neutral, the training maximizing a summation of rewards;
identifying a plurality of users nominated by the network system to receive the notification;
monitoring, by the network system, user activity of the plurality of users with respect to an application;
based on the monitoring, identifying users of the plurality of users that satisfy a trigger condition within a nomination window, and placing users that do not satisfy the trigger condition in a cooldown period;
based on user information of each of the users that satisfy the trigger condition and context of the notification, selecting content from a plurality of different content to present to each of the users that satisfy the trigger condition;
causing presentation of the notification with the selected content to each of the users that satisfy the trigger condition;
obtaining further feedback corresponding to the notification, the further feedback indicating whether each of the users that received the notification accepted, rejected, or ignored the notification;
retraining, by the network system, the machine learning model using input data obtained from the further feedback to optimize on the one or more parameters used by the network system in providing a future notification; and
based on the retrained machine learning model, causing presentation of the future notification to a further set of users using the one or more optimized parameters.