CPC G06N 3/08 (2013.01) [G06N 3/088 (2013.01); G06N 3/048 (2023.01)] | 12 Claims |
1. A system for controlling a mobile platform, the system comprising:
the mobile platform having one or more sensors thereon; and
one or more processors and a non-transitory computer-readable medium having executable instructions encoded thereon such that when executed, the one or more processors perform operations of:
determining a current states of the mobile platform via the one or more sensors;
initially training a neural network π that is integrated on the mobile platform, wherein the initial training is based on the current states of the mobile platform;
querying a Satisfiability Modulo Theories (SMT) solver when it is determined that a current increment step is on a query schedule,
wherein the query schedule determines when to query the SMT solver to generate a plurality of examples of states satisfying specified constraints of the mobile platform;
modifying the initial training of the neural network π based on the plurality of examples of states;
following training on the plurality of examples of states, selecting an action to be performed by the mobile platform in its environment,
wherein the action is selected from a probability distribution π(s) over a space of valid actions that the mobile platform can take while in the current states; and
causing the mobile platform to perform the selected action in its environment.
|