US 12,436,509 B2
	Device and method for controlling an agent
Jan Guenter Woehlke, Leonberg (DE); Felix Schmitt, Ludwigsburg (DE); and Herke van Hoof, Diemen (NL)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Filed by Robert Bosch GmbH, Stuttgart (DE)
Filed on Aug. 30, 2022, as Appl. No. 17/898,846.
Claims priority of application No. 10 2021 210 533.5 (DE), filed on Sep. 22, 2021.
Prior Publication US 2023/0090127 A1, Mar. 23, 2023
Int. Cl. G05B 13/02 (2006.01)

CPC G05B 13/027 (2013.01)

13 Claims

1. A method for controlling an agent, the method comprising the following steps:

obtaining numerical values of a first set of state variables and a second set of state variables,

wherein:

i) the first set of state variables comprises a set of state variables that represent the agent's state and that are selected depending on relevancy to a goal set to be achieved by the agent;

ii) the second set of state variables comprises a set of state variables that represent sensory information of the agent's state and that are selected depending on relevancy to the goal;

iii) the second set of state variables have a higher resolution than the first set of state variables;

iii) the numerical values of the first set of state variables together with the numerical values of the second set of variables represent a current full state of the agent; and

iv) the numerical values of the first set of state variables represent a current partial state of the agent;

determining a state value prior, wherein the state value prior includes, for each of a plurality of potential partial states into which the agent can be transitioned after being in the current partial state, a respective prior estimation of a value or cost of being in the respective potential state for the agent to achieve the goal, the prior estimations being generated based on a model of an environment;

supplying an input to a neural network, wherein the input includes:

a local crop of the state value prior, the local crop being a sub-section of the state value prior centered at the current partial state of the agent and having a pre-defined extent from the current partial state;

the numerical values of the second set of state variables; and

the numerical values of the first set of state variables;

outputting, by the neural network and based on the supplied input, a respective value or cost for each of a plurality of control actions to perform while the agent is at the current state of the agent;

selecting one or more of the control actions based on the values or costs output by the neural network; and

controlling the agent in accordance with control signals for executing the selected one or more of the control actions.