US 12,145,612 B2
	Device and method for controlling a hardware agent in a control situation having a plurality of hardware agents
Philipp Geiger, Leonberg (DE); and Christoph-Nikolas Straehle, Ingolstadt (DE)
Assigned to ROBERT BOSCH GMBH, Stuttgart (DE)
Filed by Robert Bosch GmbH, Stuttgart (DE)
Filed on Jul. 8, 2021, as Appl. No. 17/371,076.
Claims priority of application No. 102020210376.3 (DE), filed on Aug. 14, 2020.
Prior Publication US 2022/0048527 A1, Feb. 17, 2022
Int. Cl. B60W 50/08 (2020.01); B60W 50/00 (2006.01); B60W 60/00 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01)

CPC B60W 50/085 (2013.01) [B60W 50/0097 (2013.01); B60W 60/001 (2020.02); G06N 3/045 (2023.01); G06N 3/08 (2013.01); B60W 2050/0044 (2013.01)]

9 Claims

1. A method for controlling a hardware agent in a control situation having a plurality of hardware agents, comprising the following steps:

ascertaining items of information that characterize and/or influence: (i) a behavior of the plurality of hardware agents and/or (ii) the control situation;

ascertaining a potential function by supplying the items of information that characterize and/or influence the behavior of the plurality of hardware agents and/or the control situation to a first neural network that is trained to output, from the items of information that characterize and/or influence the behavior of the plurality of hardware agents and/or the control situation, parameter values of the potential function, the potential function assigning to common action sequences, which each have an action sequence for each hardware agent in the control situation, a respective potential value that characterizes a utility that the hardware agents have from the respective common action sequence in the control situation;

ascertaining a control scenario for the control situation from a plurality of possible control scenarios by supplying the items of information that characterize and/or influence the behavior of the plurality of hardware agents and/or the control situation to a second neural network that is trained to ascertain, from the items of information that characterize and/or influence the behavior of the plurality of hardware agents and/or the control situation, one or more control scenarios from the plurality of possible control scenarios for the control situation, each of the control scenarios containing a set of possible common action sequences for the hardware agents;

ascertaining a common action sequence of the common action sequences for the plurality of hardware agents by seeking a local optimum of the ascertained potential function over the possible common action sequences of the ascertained control scenario;

controlling at least one of the plurality of hardware agents in accordance with the ascertained common action sequence, wherein the common action sequence is a common trajectory, wherein the second neural network determines a probability distribution over the control scenarios that reduces a number of Nash equilibria corresponding to the control scenarios that are used to predict the common trajectory;

training the first neural network through supervised learning with first training data that include a plurality of first training data elements, each of the first training data elements including items of information that characterize and/or influence the behavior of the plurality of hardware agents and/or the control situation, and a ground truth Nash equilibria for the common action sequence; and

training the second neural network through supervised learning with second training data that include a plurality of second training data elements, each second training data element including items of information that characterize and/or influence the behavior of the plurality of hardware agents and/or the control situation, and the ground truth Nash equilibria for the control scenario.