US 11,654,933 B2
Navigation trajectory using reinforcement learning for an ego vehicle in a navigation network
Tiffany J. Hwu, San Diego, CA (US); Iman Mohammadrezazadeh, Los Angeles, CA (US); Michael J. Daily, Thousand Oaks, CA (US); and Rajan Bhattacharyya, Sherman Oaks, CA (US)
Assigned to GM Global Technology Operations LLC, Detroit, MI (US)
Filed by GM GLOBAL TECHNOLOGY OPERATIONS LLC, Detroit, MI (US)
Filed on Nov. 10, 2020, as Appl. No. 17/94,255.
Prior Publication US 2022/0144309 A1, May 12, 2022
Int. Cl. B60W 60/00 (2020.01); G06N 3/04 (2023.01); B60W 30/12 (2020.01)
CPC B60W 60/0011 (2020.02) [B60W 30/12 (2013.01); G06N 3/04 (2013.01); B60W 2552/10 (2020.02); B60W 2555/20 (2020.02); B60W 2556/40 (2020.02)] 20 Claims
OG exemplary drawing
 
1. An ego vehicle comprising:
a plurality of decider modules coupled to a resolver module, and configured to
generate a plurality of trajectory decisions for the ego vehicle at a current time,
generate a plurality of candidate trajectory decisions for a plurality of neighboring vehicles at the current time,
generate a current two-dimensional horizontal representation of a flat space in front of, to a left of, to a right of, and behind the ego vehicle, wherein the current two-dimensional horizontal representation includes a current ego location of the ego vehicle and a plurality of current neighboring locations of the plurality of neighboring vehicles at the current time,
generate a plurality of future two-dimensional horizontal representations of the flat space around the ego vehicle by estimating a plurality of future ego locations of the ego vehicle and a plurality of future neighboring locations of the plurality of neighboring vehicles at a plurality of future points in time based on the plurality of trajectory decisions and the plurality of candidate trajectory decisions, wherein the plurality of future ego locations are different than the current ego location, and the plurality of future neighboring locations are different than the plurality of current neighboring locations, and
generate a three-dimensional state space by stacking the current two-dimensional horizontal representation and the plurality of future two-dimensional horizontal representations, wherein the three-dimensional state space includes a time dimension, a flat space dimension, and a road width dimension;
a grader module coupled to the resolver module, and configured to generate a plurality of rewards for the plurality of trajectory decisions based on a recent behavior of the ego vehicle, wherein the resolver module is configured to select a final trajectory decision for the ego vehicle from the plurality of trajectory decisions based on the three-dimensional state space and the plurality of rewards; and
a control module coupled to the resolver module and configured to navigate the ego vehicle in response to the final trajectory decision.