US 11,810,038 B2
	Risk optimization through reinforcement learning
James Robert Kozloski, New Fairfield, CT (US); Timothy Michael Lynar, Melbourne (AU); Suraj Pandey, Melbourne (AU); and John Michael Wagner, Plainville, CT (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jul. 6, 2016, as Appl. No. 15/202,686.
Prior Publication US 2018/0012159 A1, Jan. 11, 2018
Int. Cl. G06N 20/00 (2019.01); G06N 7/01 (2023.01); G06Q 10/0635 (2023.01); G06Q 10/0631 (2023.01)

CPC G06Q 10/0635 (2013.01) [G06N 7/01 (2023.01); G06N 20/00 (2019.01); G06Q 10/063116 (2013.01)]

18 Claims

1. A risk management system, comprising:

a processor; and

a memory, the memory storing instructions to cause the processor to:

map user data, site data, and equipment data as well as past data from a database to an event on a site;

determine a relationship between the mapped data and the event based on behaviors exhibited by the user and an impact on a performance factor and a risk factor; and

use reinforcement learning via a machine learning algorithm to learn the performance factor to the risk factor ratio to change an overall site productivity, the reinforcement learning determining the change to the performance factor to the risk factor ratio through equipment operator modelling by modelling:

the equipment operation by the user while operating on the site;

user profiling behaviors; and

actions of users based on the relationships determined,

wherein an action to collectively change an input into a future of the user data, the site data, and the equipment data in concert is recommended based on a result of the reinforcement learning to achieve the overall site productivity to meet a production outcome by changing an activity pertaining to the user data, the site data, and the equipment data such that the relationship between the mapped data and the event change the performance factor to the risk factor ratio,

wherein the equipment data comprises:

a type of equipment; and

a risk associated with a hazard associated with the type of equipment while operating on the site,

wherein the user data comprises a cognitive state of the user including a distraction level and a fatigue level,

wherein the memory further stores instructions to cause the processor to create a schedule of actions for the user to follow based on the schedule adhering to the changed performance factor to risk factor ratio learned by the reinforcement learning circuit,

wherein the user data further comprises a user cohort, and

wherein the user data that is collected includes:

time spent typing;

time spent moussing and time spent reading;

repeated attempts to change configuration;

warnings by supervisors;

changes in skin luminescence; and

changes in acceleration that includes motion, speed, and anomalous movement,

wherein the collected user data is mapped as mapped user data based on statistical analysis to determine the relationship between the behaviors exhibited by the user and the likely impact on performance and risk, and

wherein the mapped user data is run through the reinforcement learning via a model based reinforcement learning technique to gain an understanding of the relationship between the mapped user data and the impact on the performance to modify the change to the performance factor to the risk factor ratio based on a result of the mapped user data run through the model based reinforcement learning technique,

further comprising further learning a time to encourage the user to perform at a desired/predicted level of aggression and creating a specific input at the time into the schedule to remind the user.