| CPC G06N 20/00 (2019.01) [G06Q 40/06 (2013.01)] | 14 Claims |

|
1. A method, comprising:
a risk-sensitive learning engine comprising at least one computer processor receiving, from a data source, a plurality of sets of training data for a plurality of time steps, each set of training data comprising an initial state, an action comprising a trade, a reward, and a state at the next time step;
the risk-sensitive learning engine generating a correction factor for a risk sensitive policy using a Q-learning process by:
initializing a Q table;
receiving a training budget comprising a plurality of episodes, a risk aversion coefficient, and an end state; and
for each episode in the episode in the training budget:
setting the end state for the episode;
setting a time to zero and a state to an initial state;
executing the action for time t and monitoring a reward at t+1 and a state at time t+1, wherein the reward at t+1 and the state at t+1 are results of the action at time t;
the risk-sensitive learning engine increasing time t by one;
the risk-sensitive learning engine executing the action for time t and monitoring a reward at time t+1 and a state at time t+1, wherein the reward at t+1 and the state at t+1 are results of the action at time t;
the risk-sensitive learning engine calculating an average reward over time t;
the risk-sensitive learning engine calculating the correction factor based on the reward at time t+1, the average reward over time, and the risk aversion coefficient, wherein the correction factor minimizes stochasticity for the reward based on the risk aversion coefficient; and
repeating the steps of increasing time t by one, executing the action for time t+1 and monitoring the reward at time t+1 and a state at time t+1, calculating the average reward over time t, and calculating the correction factor based on the reward at time t+1 and the risk aversion coefficient until the end state is met;
the risk-sensitive learning engine outputting a trained risk-sensitive policy function with the correction factor to a risk engine;
the risk engine receiving real-time data from the data source;
the risk engine applying the trained risk-sensitive policy function with the correction factor to the real-time data;
the risk engine executes an action based on an output of the applying.
|